[
https://issues.apache.org/jira/browse/HDDS-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595436#comment-16595436
]
Ajay Kumar edited comment on HDDS-351 at 8/28/18 7:18 PM:
----------------------------------------------------------
[~anu] thanks for detailed review. Please see my response inline.
{quote}EventQueue.java - I am afraid adding removeHandler would cause race
conditions to the code since the Data structures do not seem to be protected.
{quote}
Changed corresponding data structure to cover this. Nested list and map are
still unsynchronized. Since removeHandler is limited to chill mode i think this
should be ok. Alternatively we can get rid of these new API's and let
SCMChillModeManager handle register events. (will need a minor change in
SCMChillModeManager)
{quote}nit: MiniOzoneClusterImpl.java - "Stopping down" ==> stopping
{quote}
done
{quote}SCMChillModeManager.java: Line 72#run() - Why do we have a run function
in this code path? That thread starts up and immediately dies. I don't
understand why we need to run this in the context of a thread. From a quick
reading, it looks like we could have done this in a constructor.
{quote}
Basic idea was to handle rules which doesn't listen for specific events but
just monitor state. Removed for time being.
{quote}SCMChillModeManager.java: Line 75 - This implies that
getScmContainerManager is finished initializing. If that is the contract that
we want to support, we must make it explicit. We need to create a flag in
ScmContainerManager that indicates that is done with init. We need to assert
that info here and take a dependency on that; otherwise we can avoid depending
on ContainerManager.
{quote}
This explicit contract is no better than checking for instance being not null
as for all practical purpose we can't check that flag without instance being
initialized. If instance is not null than it implicitly means container data
structure is already initialized. Added not null check in new patch.
{quote}exitRules[0] - This seems like a poor coding practice. If we want to add
rules that need to evaluate, I suggest that we create a list, and invoke the
add function. The problem is not in this line. The problem is in line 118,
where if someone changes the Array index the code will break. That kind of
dependency on an index of an array and function pointer is very brittle.
Already commented this seems like a bad pattern of code.
exitRules[0].process(nodeRegistrationContainerReport);
{quote}
Initially it was a list but changed it to array as size of rules is
definitive/static. Ya, it looks little brittle but issue remains same in list
as well as you have to access members by index. But i agree from readability
perspective this can be improved so changed it to a map.
{quote}private double maxContainer; why is this double? we are assigning an
"int" to this later.
{quote}
do avoid casting during cutoff calculations.
{quote}Line 82: validateChillModeExitRules – if we have rule framework, then
hard coding if (maxContainer == 0) exitChillMode seems a violation of the
framework that we are building up.
{quote}
good find, this should move inside inner class.
{quote}maxContainer – why is this variable being accessed from an inner class?
Why are we not passing this as a value into the inner class? Why bind to the
outer class?
{quote}
moved to inner class.
{quote}((ContainerChillModeRule) exitRules[0]).clear(); – why?
{quote}
Renamed it to cleanup and moved it to interface to allow optional cleanup of
resources on chill mode exit.
{quote}The model of exit criteria seems little off in my mind. I think we
should have an object that takes the current state of the cluster and the
expected state of the cluster – Nodes, Closed Containers, Open Containers, and
Pipelines. The expected and current state will allow these *ExitRule classes to
implement code that decides if we should exit from chill mode.
{quote}
Individual inner rule classes will maintain there own state, we should just
query them if they have meet the exit criteria or not. Thats what validate
function does.
{quote}SCMDatanodeProtocolServer.java: Shouldn't the
{{eventPublisher.fireEvent}} be inside the if?
{quote}
agree, moved it inside.
{quote}What is the difference between NODE_REGISTRATION_CONT_REPORT and just
CONTAINER_REPORT?
{quote}
In short NODE_REGISTRATION_CONT_REPORT is generated during node registration
while CONTAINER_REPORT is emitted continuously on heartbeats.
Since we are interested in knowing initial container reports, we are getting
them from register instead of heartbeat. This also avoid any special handling
of redundant containerReports in case we choose to use CONTAINER_REPORT.
{quote}StorageContainerManager.java – why remove the START_REPLICATION event?
{quote}
This is now transmitted on chill mode exit.
was (Author: ajayydv):
[~anu] thanks for detailed review. Please see my response inline.
{quote}EventQueue.java - I am afraid adding removeHandler would cause race
conditions to the code since the Data structures do not seem to be protected.
{quote}
Changed corresponding data structure to cover this.
{quote}nit: MiniOzoneClusterImpl.java - "Stopping down" ==> stopping
{quote}
done
{quote}SCMChillModeManager.java: Line 72#run() - Why do we have a run function
in this code path? That thread starts up and immediately dies. I don't
understand why we need to run this in the context of a thread. From a quick
reading, it looks like we could have done this in a constructor.
{quote}
Basic idea was to handle rules which doesn't listen for specific events but
just monitor state. Removed for time being.
{quote}SCMChillModeManager.java: Line 75 - This implies that
getScmContainerManager is finished initializing. If that is the contract that
we want to support, we must make it explicit. We need to create a flag in
ScmContainerManager that indicates that is done with init. We need to assert
that info here and take a dependency on that; otherwise we can avoid depending
on ContainerManager.
{quote}
This explicit contract is no better than checking for instance being not null
as for all practical purpose we can't check that flag without instance being
initialized. If instance is not null than it implicitly means container data
structure is already initialized. Added not null check in new patch.
{quote}exitRules[0] - This seems like a poor coding practice. If we want to add
rules that need to evaluate, I suggest that we create a list, and invoke the
add function. The problem is not in this line. The problem is in line 118,
where if someone changes the Array index the code will break. That kind of
dependency on an index of an array and function pointer is very brittle.
Already commented this seems like a bad pattern of code.
exitRules[0].process(nodeRegistrationContainerReport);
{quote}
Initially it was a list but changed it to array as size of rules is
definitive/static. Ya, it looks little brittle but issue remains same in list
as well as you have to access members by index. But i agree from readability
perspective this can be improved so changed it to a map.
{quote}private double maxContainer; why is this double? we are assigning an
"int" to this later.
{quote}
do avoid casting during cutoff calculations.
{quote}Line 82: validateChillModeExitRules – if we have rule framework, then
hard coding if (maxContainer == 0) exitChillMode seems a violation of the
framework that we are building up.
{quote}
good find, this should move inside inner class.
{quote}maxContainer – why is this variable being accessed from an inner class?
Why are we not passing this as a value into the inner class? Why bind to the
outer class?
{quote}
moved to inner class.
{quote}((ContainerChillModeRule) exitRules[0]).clear(); – why?
{quote}
Renamed it to cleanup and moved it to interface to allow optional cleanup of
resources on chill mode exit.
{quote}The model of exit criteria seems little off in my mind. I think we
should have an object that takes the current state of the cluster and the
expected state of the cluster – Nodes, Closed Containers, Open Containers, and
Pipelines. The expected and current state will allow these *ExitRule classes to
implement code that decides if we should exit from chill mode.
{quote}
Individual inner rule classes will maintain there own state, we should just
query them if they have meet the exit criteria or not. Thats what validate
function does.
{quote}SCMDatanodeProtocolServer.java: Shouldn't the
{{eventPublisher.fireEvent}} be inside the if?
{quote}
agree, moved it inside.
{quote}What is the difference between NODE_REGISTRATION_CONT_REPORT and just
CONTAINER_REPORT?
{quote}
In short NODE_REGISTRATION_CONT_REPORT is generated during node registration
while CONTAINER_REPORT is emitted continuously on heartbeats.
Since we are interested in knowing initial container reports, we are getting
them from register instead of heartbeat. This also avoid any special handling
of redundant containerReports in case we choose to use CONTAINER_REPORT.
{quote}StorageContainerManager.java – why remove the START_REPLICATION event?
{quote}
This is now transmitted on chill mode exit.
> Add chill mode state to SCM
> ---------------------------
>
> Key: HDDS-351
> URL: https://issues.apache.org/jira/browse/HDDS-351
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Reporter: Ajay Kumar
> Assignee: Ajay Kumar
> Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-351.00.patch, HDDS-351.01.patch
>
>
> Add chill mode state to SCM
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]