[jira] [Comment Edited] (HDDS-351) Add chill mode state to SCM

Ajay Kumar (JIRA) Tue, 28 Aug 2018 12:19:42 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595436#comment-16595436
 ]


Ajay Kumar edited comment on HDDS-351 at 8/28/18 7:18 PM:
----------------------------------------------------------

[~anu] thanks for detailed review. Please see my response inline.
{quote}EventQueue.java - I am afraid adding removeHandler would cause race 
conditions to the code since the Data structures do not seem to be protected.
{quote}
Changed corresponding data structure to cover this. Nested list and map are 
still unsynchronized. Since removeHandler is limited to chill mode i think this 
should be ok. Alternatively we can get rid of these new API's and let 
SCMChillModeManager handle register events. (will need a minor change in 
SCMChillModeManager)

{quote}nit: MiniOzoneClusterImpl.java - "Stopping down" ==> stopping
{quote}
done
{quote}SCMChillModeManager.java: Line 72#run() - Why do we have a run function 
in this code path? That thread starts up and immediately dies. I don't 
understand why we need to run this in the context of a thread. From a quick 
reading, it looks like we could have done this in a constructor.
{quote}
Basic idea was to handle rules which doesn't listen for specific events but 
just monitor state. Removed for time being.
{quote}SCMChillModeManager.java: Line 75 - This implies that 
getScmContainerManager is finished initializing. If that is the contract that 
we want to support, we must make it explicit. We need to create a flag in 
ScmContainerManager that indicates that is done with init. We need to assert 
that info here and take a dependency on that; otherwise we can avoid depending 
on ContainerManager.
{quote}
This explicit contract is no better than checking for instance being not null 
as for all practical purpose we can't check that flag without instance being 
initialized. If instance is not null than it implicitly means container data 
structure is already initialized. Added not null check in new patch.
{quote}exitRules[0] - This seems like a poor coding practice. If we want to add 
rules that need to evaluate, I suggest that we create a list, and invoke the 
add function. The problem is not in this line. The problem is in line 118, 
where if someone changes the Array index the code will break. That kind of 
dependency on an index of an array and function pointer is very brittle.
 Already commented this seems like a bad pattern of code.
 exitRules[0].process(nodeRegistrationContainerReport);
{quote}
Initially it was a list but changed it to array as size of rules is 
definitive/static. Ya, it looks little brittle but issue remains same in list 
as well as you have to access members by index. But i agree from readability 
perspective this can be improved so changed it to a map.
{quote}private double maxContainer; why is this double? we are assigning an 
"int" to this later.
{quote}
do avoid casting during cutoff calculations. 
{quote}Line 82: validateChillModeExitRules – if we have rule framework, then 
hard coding if (maxContainer == 0) exitChillMode seems a violation of the 
framework that we are building up.
{quote}
good find, this should move inside inner class.

 
{quote}maxContainer – why is this variable being accessed from an inner class? 
Why are we not passing this as a value into the inner class? Why bind to the 
outer class?
{quote}
moved to inner class.
{quote}((ContainerChillModeRule) exitRules[0]).clear(); – why?
{quote}
Renamed it to cleanup and moved it to interface to allow optional cleanup of 
resources on chill mode exit.
{quote}The model of exit criteria seems little off in my mind. I think we 
should have an object that takes the current state of the cluster and the 
expected state of the cluster – Nodes, Closed Containers, Open Containers, and 
Pipelines. The expected and current state will allow these *ExitRule classes to 
implement code that decides if we should exit from chill mode.
{quote}
Individual inner rule classes will maintain there own state, we should just 
query them if they have meet the exit criteria or not. Thats what validate 
function does.
{quote}SCMDatanodeProtocolServer.java: Shouldn't the 
{{eventPublisher.fireEvent}} be inside the if?
{quote}
agree, moved it inside.
{quote}What is the difference between NODE_REGISTRATION_CONT_REPORT and just 
CONTAINER_REPORT?
{quote}
In short NODE_REGISTRATION_CONT_REPORT is generated during node registration 
while CONTAINER_REPORT is emitted continuously on heartbeats.
Since we are interested in knowing initial container reports, we are getting 
them from register instead of heartbeat. This  also avoid any special handling 
of redundant containerReports in case we choose to use CONTAINER_REPORT.
{quote}StorageContainerManager.java – why remove the START_REPLICATION event?
{quote}
This is now transmitted on chill mode exit.


was (Author: ajayydv):
[~anu] thanks for detailed review. Please see my response inline.
{quote}EventQueue.java - I am afraid adding removeHandler would cause race 
conditions to the code since the Data structures do not seem to be protected.
{quote}
Changed corresponding data structure to cover this.
{quote}nit: MiniOzoneClusterImpl.java - "Stopping down" ==> stopping
{quote}
done
{quote}SCMChillModeManager.java: Line 72#run() - Why do we have a run function 
in this code path? That thread starts up and immediately dies. I don't 
understand why we need to run this in the context of a thread. From a quick 
reading, it looks like we could have done this in a constructor.
{quote}
Basic idea was to handle rules which doesn't listen for specific events but 
just monitor state. Removed for time being.
{quote}SCMChillModeManager.java: Line 75 - This implies that 
getScmContainerManager is finished initializing. If that is the contract that 
we want to support, we must make it explicit. We need to create a flag in 
ScmContainerManager that indicates that is done with init. We need to assert 
that info here and take a dependency on that; otherwise we can avoid depending 
on ContainerManager.
{quote}
This explicit contract is no better than checking for instance being not null 
as for all practical purpose we can't check that flag without instance being 
initialized. If instance is not null than it implicitly means container data 
structure is already initialized. Added not null check in new patch.
{quote}exitRules[0] - This seems like a poor coding practice. If we want to add 
rules that need to evaluate, I suggest that we create a list, and invoke the 
add function. The problem is not in this line. The problem is in line 118, 
where if someone changes the Array index the code will break. That kind of 
dependency on an index of an array and function pointer is very brittle.
 Already commented this seems like a bad pattern of code.
 exitRules[0].process(nodeRegistrationContainerReport);
{quote}
Initially it was a list but changed it to array as size of rules is 
definitive/static. Ya, it looks little brittle but issue remains same in list 
as well as you have to access members by index. But i agree from readability 
perspective this can be improved so changed it to a map.
{quote}private double maxContainer; why is this double? we are assigning an 
"int" to this later.
{quote}
do avoid casting during cutoff calculations. 
{quote}Line 82: validateChillModeExitRules – if we have rule framework, then 
hard coding if (maxContainer == 0) exitChillMode seems a violation of the 
framework that we are building up.
{quote}
good find, this should move inside inner class.

 
{quote}maxContainer – why is this variable being accessed from an inner class? 
Why are we not passing this as a value into the inner class? Why bind to the 
outer class?
{quote}
moved to inner class.
{quote}((ContainerChillModeRule) exitRules[0]).clear(); – why?
{quote}
Renamed it to cleanup and moved it to interface to allow optional cleanup of 
resources on chill mode exit.
{quote}The model of exit criteria seems little off in my mind. I think we 
should have an object that takes the current state of the cluster and the 
expected state of the cluster – Nodes, Closed Containers, Open Containers, and 
Pipelines. The expected and current state will allow these *ExitRule classes to 
implement code that decides if we should exit from chill mode.
{quote}
Individual inner rule classes will maintain there own state, we should just 
query them if they have meet the exit criteria or not. Thats what validate 
function does.
{quote}SCMDatanodeProtocolServer.java: Shouldn't the 
{{eventPublisher.fireEvent}} be inside the if?
{quote}
agree, moved it inside.
{quote}What is the difference between NODE_REGISTRATION_CONT_REPORT and just 
CONTAINER_REPORT?
{quote}
In short NODE_REGISTRATION_CONT_REPORT is generated during node registration 
while CONTAINER_REPORT is emitted continuously on heartbeats.
Since we are interested in knowing initial container reports, we are getting 
them from register instead of heartbeat. This  also avoid any special handling 
of redundant containerReports in case we choose to use CONTAINER_REPORT.
{quote}StorageContainerManager.java – why remove the START_REPLICATION event?
{quote}
This is now transmitted on chill mode exit.

> Add chill mode state to SCM
> ---------------------------
>
>                 Key: HDDS-351
>                 URL: https://issues.apache.org/jira/browse/HDDS-351
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Ajay Kumar
>            Assignee: Ajay Kumar
>            Priority: Major
>             Fix For: 0.2.1
>
>         Attachments: HDDS-351.00.patch, HDDS-351.01.patch
>
>
> Add chill mode state to SCM



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDDS-351) Add chill mode state to SCM

Reply via email to