[
https://issues.apache.org/jira/browse/HADOOP-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622853#action_12622853
]
Steve Loughran commented on HADOOP-3628:
----------------------------------------
I am still working on this (albeit with a break for some travel), but I've
temporarily stopped syncing with SVN_HEAD while I try and get my local tests
working before the hadoop uk event; some of the lessons there are interesting.
One test of mine that is failing is some filesystem operations on a cluster
that has just been brought up; the problem here is that the namenode considers
itself live when its filesystem is happy, but for FS operations, the HDFS as a
whole isn't live until >1 namenode is up. Its not enough to block until the
namenode state==LIVE and ping() is good; I have to probe the #of datanodes and
block until it is adequate. I've been doing that in my own codebase;
retrofitting a method to my subclassed namenode and jobtracker to get their
worker count and then exporting that via RMI.
I think it would make sense to add a common interface to all services that have
workers that we could use to probe for this number, and we ought to think about
making both it and the ping() operation available over RPC, so that other tools
can check the cluster status
> Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.
> -----------------------------------------------------------------------------
>
> Key: HADOOP-3628
> URL: https://issues.apache.org/jira/browse/HADOOP-3628
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs, mapred
> Affects Versions: 0.19.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: AbstractHadoopComponent.java, hadoop-3628.patch,
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch,
> hadoop-3628.patch, hadoop-3628.patch
>
>
> I'd like to propose we have a standard interface for hadoop components, the
> things that get started or stopped when you bring up a namenode. currently,
> some of these classes have a stop() or shutdown() method, with no standard
> name/interface, but no way of seeing if they are live, checking their health
> of shutting them down reliably. Indeed, there is a tendency for the spawned
> threads to not want to die; to require the entire process to be killed to
> stop the workers.
> Having a standard interface would make it easier for
> * management tools to manage the different things
> * monitoring the state of things
> * subclassing
> The latter is interesting as right now TaskTracker and JobTracker start up
> threads in their constructor; that's very dangerous as subclasses may have
> their methods called before they are full initialised. Adding this interface
> would be the right time to clean up the startup process so that subclassing
> is less risky.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.