[
https://issues.apache.org/jira/browse/HADOOP-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625511#action_12625511
]
Konstantin Shvachko commented on HADOOP-3628:
---------------------------------------------
Steve,
# In the patch you consider name-node LIVE when it finishes initialization.
At this point the name-node starts serving data-nodes for registrations and
block reports and clients for read-only requests.
Namespace modification and replication will be prohibited until the name-node
is out of safe mode. This particularly means that jobs will not be able to run
because they will not be able to create input files or replicate configuration
files.
So either we should say that name-node is LIVE when it is out of safe mode or
we should introduce some other state that reflects name-node's readiness to
perform complete set of services.
I think we should consider name-node LIVE when it comes out of safe mode.
# JavaDocs should explicitly define each ServiceState for each server. E.g.
{code}
/** DataNode is LIVE if it successfully completed registration with the
name-node */
/** NameNode is LIVE if it is out of safe mode */
{code}
etc. I mean saying "the service is now live and available for external use" is
right, but too general, it can be clarified in "native" terms for each server.
# Looking at your ping() implementation it seems that the purpose of this
method is to report up a string describing what exactly is wrong with the
server at the moment. So why do we not just return the string instead of
throwing an exception the only purpose of which is to wrap that string. Imagine
how one would use this: catch exception, getMessage(), and then print or parse
the message.
As Doug advocated many times: "Exceptions should not be used for normal control
flow." I think for ping() this is normal control flow.
> Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.
> -----------------------------------------------------------------------------
>
> Key: HADOOP-3628
> URL: https://issues.apache.org/jira/browse/HADOOP-3628
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs, mapred
> Affects Versions: 0.19.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: AbstractHadoopComponent.java, hadoop-3628.patch,
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch,
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch,
> hadoop-3628.patch
>
>
> I'd like to propose we have a standard interface for hadoop components, the
> things that get started or stopped when you bring up a namenode. currently,
> some of these classes have a stop() or shutdown() method, with no standard
> name/interface, but no way of seeing if they are live, checking their health
> of shutting them down reliably. Indeed, there is a tendency for the spawned
> threads to not want to die; to require the entire process to be killed to
> stop the workers.
> Having a standard interface would make it easier for
> * management tools to manage the different things
> * monitoring the state of things
> * subclassing
> The latter is interesting as right now TaskTracker and JobTracker start up
> threads in their constructor; that's very dangerous as subclasses may have
> their methods called before they are full initialised. Adding this interface
> would be the right time to clean up the startup process so that subclassing
> is less risky.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.