[ 
https://issues.apache.org/jira/browse/HADOOP-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661386#action_12661386
 ] 

Suresh Srinivas commented on HADOOP-3628:
-----------------------------------------

Some comments on the document:
-Section 1.1 - Secondary NameNode is listed as "failover for the NameNode". It 
is a check pointer node currently.
-Section 1.1 - DataNode does not have notion of safemode. Same applies to 
section 2.1 Liveness discussion.
-Section 3.1 
  -Should we have a state that captures out of service for maintenance?
  -Section 3.1.1 -I am not sure why initialized state is required.  
Initialization could be part of starting up? I prefer merging states 
INITIALIZED and STARTED and calling it as STARTING.
  -I am not clear on how Failed state transitions to Terminated. If failed 
state transitions to terminated, the fact that the service failed will no 
longer available?

Comments on the code:
-Statemachine pattern could be used as follows using service as the context. 
The base ServiceState could have default implementation or force the concrete 
states to implement the messages {{start, stop, terminate}} and {{ping}}. Each 
state can have enter and exit to implement necessary functionality when 
entering and exiting a state.

{noformat}
   private abstract class ServiceState {

    abstract void enter(Service service, ServiceState newState);

    abstract void exit(Service service);

    void start(Service service) {
      // defalt implementation to either ignore or throw exception
      // I prefer not ignoring and throwing exception if the service is
      // already started
    } 

    // Default implementation
    void stop(Service service) {
      service.enterState(Service.terminatedState);
    }

    ServiceStatus ping(); {
    }

    void terminate(Service service);

    abstract void getName();
  }
}
{noformat}

  The concrete states override only handling the messages relevant to them.  
For example, method start is overridden only by the CreatedState.

{noformat}
  class CreatedState extends ServiceState {
    void enter(Service service, ServiceState newState);
    void exit(Service service);
    void start();
    void terminate();
  }

  class LiveState(Service service) {
  }

{noformat}

-With the above implementation, setServiceState will change to:
{noformat}
  protected final void setServiceState(ServiceState serviceState) {
  {
    this.serviceState.exit();
    this.serviceState = serviceState;
    this.serviceState.enter();
  }
{noformat}
  protected final void setServiceState(ServiceState serviceState) {

  This gives each state a chance to do things needed during entry and cleanup 
during exit.

-A singleton instance of each state is used.

The advantages of doing the above is:
1. I am currently looking at introducing HA to NameNode and JobTrackers. When 
doing this, I can easily introduce new states such as Active and Standby, by 
introducing say, {{HAService}} deriving from {{Service}} and reusing a lot of 
the code and the base {{ServiceState}}
2. Checks done in methods such as start, ping for appropriate state can be 
avoided, since only appropriate states handle the methods and the rest get 
default implementation from the super class ServiceState.
3. isValidStateTransition checks can also be moved to {{enter}} method of each 
state.

-In the comment for the class {{Service}}, if I understand right, it looks like 
{{innerStart}} is not synchronous. That is, when it returns the service may not 
have started completely. I prefer it to be synchronous, where the method blocks 
for the service to start before returning (if the start is done in other 
threads). Otherwise, since exception cannot be thrown, in case start failures, 
enterFailedState needs to be called by the {{Service}} subclass. The comments 
need to indicate this.

-{{start}} method is honored even if the current state is {{TERMINATED}}. Is 
this intended?
-Why are we using IOException to indicate exceptional conditions from methods 
such as innerStart etc.?

-Since {{ping}} method is called from an external entity to check the service 
state, depending on it to enter FailedState does not seem right.

-{{innerStart}}, {{innerPing}}, {{innerClose}}, {{getServiceName}} etc need to 
be abstract methods to force the subclasses to provide concrete implementations.

-I am not sure why two flovors of enterState method is needed. The second one 
with {{entryState}} parameter seems unnecessary. Instead should we allow 
transition from {{LIVE}} state to {{STARTED}}, as that is the only case where 
this is used.

-verifyState with single expected state seems better. vertifyState with 
{{expected1, expected2}} can be avoided by calling verifyState twice with two 
expected state. That seems simpler.

-I prefer not supporting Datanode.shutdown method. Instead use the Service 
method Datanode.terminate to avoid different public methods doing the same 
thing. Same applies to other classes inheriting from {{Service}}. This could be 
another Jira.

-Should innerPing throw LivenessException?

-In {{ping}} method, the {{ServiceStatus}} returned, has throwable in states 
other than LIVE. The comment seems to indicate it is being done only for FAILED 
and TERMINATED state. It will happen for CREATED, STARTED states as well.

-It is good not to pass the {{ServiceState}} to {{onInnerPingFailure}} just to 
get in the passed {{ServiceState}} , the FAILED state along with the exception. 
It could be done in {{ping}} method. This helps in doing one less thing in 
{{onInnerPingFailure}} when overriding it.

-Since the change is quite big, I will take another stab at the review, if you 
submit an updated patch.





> Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-3628
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3628
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs, mapred
>    Affects Versions: 0.20.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: AbstractHadoopComponent.java, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-lifecycle.pdf, hadoop-lifecycle.sxw
>
>
> I'd like to propose we have a standard interface for hadoop components, the 
> things that get started or stopped when you bring up a namenode. currently, 
> some of these classes have a stop() or shutdown() method, with no standard 
> name/interface, but no way of seeing if they are live, checking their health 
> of shutting them down reliably. Indeed, there is a tendency for the spawned 
> threads to not want to die; to require the entire process to be killed to 
> stop the workers. 
> Having a standard interface would make it easier for 
>  * management tools to manage the different things
>  * monitoring the state of things
>  * subclassing
> The latter is interesting as right now TaskTracker and JobTracker start up 
> threads in their constructor; that's very dangerous as subclasses may have 
> their methods called before they are full initialised. Adding this interface 
> would be the right time to clean up the startup process so that subclassing 
> is less risky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to