[ https://issues.apache.org/jira/browse/HADOOP-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661386#action_12661386 ]
Suresh Srinivas commented on HADOOP-3628: ----------------------------------------- Some comments on the document: -Section 1.1 - Secondary NameNode is listed as "failover for the NameNode". It is a check pointer node currently. -Section 1.1 - DataNode does not have notion of safemode. Same applies to section 2.1 Liveness discussion. -Section 3.1 -Should we have a state that captures out of service for maintenance? -Section 3.1.1 -I am not sure why initialized state is required. Initialization could be part of starting up? I prefer merging states INITIALIZED and STARTED and calling it as STARTING. -I am not clear on how Failed state transitions to Terminated. If failed state transitions to terminated, the fact that the service failed will no longer available? Comments on the code: -Statemachine pattern could be used as follows using service as the context. The base ServiceState could have default implementation or force the concrete states to implement the messages {{start, stop, terminate}} and {{ping}}. Each state can have enter and exit to implement necessary functionality when entering and exiting a state. {noformat} private abstract class ServiceState { abstract void enter(Service service, ServiceState newState); abstract void exit(Service service); void start(Service service) { // defalt implementation to either ignore or throw exception // I prefer not ignoring and throwing exception if the service is // already started } // Default implementation void stop(Service service) { service.enterState(Service.terminatedState); } ServiceStatus ping(); { } void terminate(Service service); abstract void getName(); } } {noformat} The concrete states override only handling the messages relevant to them. For example, method start is overridden only by the CreatedState. {noformat} class CreatedState extends ServiceState { void enter(Service service, ServiceState newState); void exit(Service service); void start(); void terminate(); } class LiveState(Service service) { } {noformat} -With the above implementation, setServiceState will change to: {noformat} protected final void setServiceState(ServiceState serviceState) { { this.serviceState.exit(); this.serviceState = serviceState; this.serviceState.enter(); } {noformat} protected final void setServiceState(ServiceState serviceState) { This gives each state a chance to do things needed during entry and cleanup during exit. -A singleton instance of each state is used. The advantages of doing the above is: 1. I am currently looking at introducing HA to NameNode and JobTrackers. When doing this, I can easily introduce new states such as Active and Standby, by introducing say, {{HAService}} deriving from {{Service}} and reusing a lot of the code and the base {{ServiceState}} 2. Checks done in methods such as start, ping for appropriate state can be avoided, since only appropriate states handle the methods and the rest get default implementation from the super class ServiceState. 3. isValidStateTransition checks can also be moved to {{enter}} method of each state. -In the comment for the class {{Service}}, if I understand right, it looks like {{innerStart}} is not synchronous. That is, when it returns the service may not have started completely. I prefer it to be synchronous, where the method blocks for the service to start before returning (if the start is done in other threads). Otherwise, since exception cannot be thrown, in case start failures, enterFailedState needs to be called by the {{Service}} subclass. The comments need to indicate this. -{{start}} method is honored even if the current state is {{TERMINATED}}. Is this intended? -Why are we using IOException to indicate exceptional conditions from methods such as innerStart etc.? -Since {{ping}} method is called from an external entity to check the service state, depending on it to enter FailedState does not seem right. -{{innerStart}}, {{innerPing}}, {{innerClose}}, {{getServiceName}} etc need to be abstract methods to force the subclasses to provide concrete implementations. -I am not sure why two flovors of enterState method is needed. The second one with {{entryState}} parameter seems unnecessary. Instead should we allow transition from {{LIVE}} state to {{STARTED}}, as that is the only case where this is used. -verifyState with single expected state seems better. vertifyState with {{expected1, expected2}} can be avoided by calling verifyState twice with two expected state. That seems simpler. -I prefer not supporting Datanode.shutdown method. Instead use the Service method Datanode.terminate to avoid different public methods doing the same thing. Same applies to other classes inheriting from {{Service}}. This could be another Jira. -Should innerPing throw LivenessException? -In {{ping}} method, the {{ServiceStatus}} returned, has throwable in states other than LIVE. The comment seems to indicate it is being done only for FAILED and TERMINATED state. It will happen for CREATED, STARTED states as well. -It is good not to pass the {{ServiceState}} to {{onInnerPingFailure}} just to get in the passed {{ServiceState}} , the FAILED state along with the exception. It could be done in {{ping}} method. This helps in doing one less thing in {{onInnerPingFailure}} when overriding it. -Since the change is quite big, I will take another stab at the review, if you submit an updated patch. > Add a lifecycle interface for Hadoop components: namenodes, job clients, etc. > ----------------------------------------------------------------------------- > > Key: HADOOP-3628 > URL: https://issues.apache.org/jira/browse/HADOOP-3628 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs, mapred > Affects Versions: 0.20.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Attachments: AbstractHadoopComponent.java, hadoop-3628.patch, > hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, > hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, > hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, > hadoop-lifecycle.pdf, hadoop-lifecycle.sxw > > > I'd like to propose we have a standard interface for hadoop components, the > things that get started or stopped when you bring up a namenode. currently, > some of these classes have a stop() or shutdown() method, with no standard > name/interface, but no way of seeing if they are live, checking their health > of shutting them down reliably. Indeed, there is a tendency for the spawned > threads to not want to die; to require the entire process to be killed to > stop the workers. > Having a standard interface would make it easier for > * management tools to manage the different things > * monitoring the state of things > * subclassing > The latter is interesting as right now TaskTracker and JobTracker start up > threads in their constructor; that's very dangerous as subclasses may have > their methods called before they are full initialised. Adding this interface > would be the right time to clean up the startup process so that subclassing > is less risky. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.