[
https://issues.apache.org/jira/browse/HDFS-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798364#action_12798364
]
Steve Loughran commented on HDFS-890:
-------------------------------------
{{DataNode.makeInstance()}} is used in
# {{MiniDFSCluster.startDataNodes()}}; this code mistakenly assumes it can
never get a null reference back; it should move to any new method call. Similary
# {{DataNode.createDataNode()}} which again is used in
{{MiniDFSCluster.restartDataNode()}} which also assumes it never sees a null
# {{DataNode.main()}} which catches and logs any exception, and looks for a
null value by skipping daemon startup and exiting with a 0 exit code.
# {{TestHDFSServerPorts}}
# Mapreduce's {{TestMRServerPorts}} tests, which also assume that they don't
see null back
Most of this code expects to see exceptions on failure, so will handle a
stricter startup operation with ease. The intersting one is
{{DataNode.main()}}, which, if it caught the exception, would now exit with a
-1 code, rather than a 0 exit code. This would be a change in behaviour which
would be visible to shell scripts: it would now be an error to attempt to start
a datanode none of whose data dirs were usable.
I would argue this is a feature, such an exit code would be beneficial to
people wondering why their datanodes weren't coming up and weren't being
reported. It is the unix way, and is much easier to test for. However, it would
be a change in behaviour.
> Have a way of creating datanodes that throws an meaningful exception on
> failure
> -------------------------------------------------------------------------------
>
> Key: HDFS-890
> URL: https://issues.apache.org/jira/browse/HDFS-890
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Affects Versions: 0.22.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
>
> In HDFS-884, I proposed printing out more details on why things fail. This is
> hard to test, because you need to subvert the log4j back end that your test
> harness will itself have grabbed.
> There is a way to make it testable, and to make it easier for anyone creating
> datanodes in process to recognise and handle failure: have a static
> CreateDatanode() method that throws exceptions when directories cannot be
> created or other problems arise. Right now some problems trigger failure,
> others just return a null reference saying "something went wrong but we won't
> tell you what -hope you know where the logs go".
> The HDFS-884 patch would be replaced by something that threw an exception;
> the existing methods would catch this, log it and return null. The new method
> would pass it straight up.
> This is easier to test, better for others. If people think this is good, I
> will code it up and mark the old API as deprecated.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.