[
https://issues.apache.org/jira/browse/HADOOP-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624148#action_12624148
]
Konstantin Shvachko commented on HADOOP-3628:
---------------------------------------------
Steve, thanks for the slides. Lots of interesting stuff.
> HDFS cluster is "live" if enough datanodes are connected to the namenode,
> where "enough" is a matter of personal preference.
Sounds like safe-mode, which precisely defines what "enough" is.
We consider the name-node alive, when it leaves safe mode.
Some links can be found in
http://wiki.apache.org/hadoop/FAQ#12
> How do we represent these states to the caller?
> One option: fail the ping() with explicit exceptions
This is what I do not understand. Why do you need ping() to fail if you can
instead call getServiceState() and get more
detailed information on the node state rather than just boolean failed / not
failed provided by ping()?
> A Datanode is only live when it is connected to a name node.
It would be good to have precise definitions like that for the states of
different servers (in JavaDoc).
Sometimes it is not clear what is the difference between INITIALIZED, STARTED,
and LIVE.
Or why does it matter whether the server was TERMINATED or FAILED.
> Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.
> -----------------------------------------------------------------------------
>
> Key: HADOOP-3628
> URL: https://issues.apache.org/jira/browse/HADOOP-3628
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs, mapred
> Affects Versions: 0.19.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: AbstractHadoopComponent.java, hadoop-3628.patch,
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch,
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch
>
>
> I'd like to propose we have a standard interface for hadoop components, the
> things that get started or stopped when you bring up a namenode. currently,
> some of these classes have a stop() or shutdown() method, with no standard
> name/interface, but no way of seeing if they are live, checking their health
> of shutting them down reliably. Indeed, there is a tendency for the spawned
> threads to not want to die; to require the entire process to be killed to
> stop the workers.
> Having a standard interface would make it easier for
> * management tools to manage the different things
> * monitoring the state of things
> * subclassing
> The latter is interesting as right now TaskTracker and JobTracker start up
> threads in their constructor; that's very dangerous as subclasses may have
> their methods called before they are full initialised. Adding this interface
> would be the right time to clean up the startup process so that subclassing
> is less risky.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.