[
https://issues.apache.org/jira/browse/HADOOP-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Douglas updated HADOOP-1885:
----------------------------------
Status: Patch Available (was: Open)
> Race condition in MiniDFSCluster shutdown
> -----------------------------------------
>
> Key: HADOOP-1885
> URL: https://issues.apache.org/jira/browse/HADOOP-1885
> Project: Hadoop
> Issue Type: Bug
> Components: test
> Reporter: Chris Douglas
> Assignee: Chris Douglas
> Attachments: 1885.patch
>
>
> Hudson has been sporadically failing tests that start- or follow tests that
> start- multiple datanodes in MiniDFSCluster, particularly on Solaris and
> Windows. The following appears to be at least partially responsible (much
> credit to Nigel for helping to discern this).
> A common error:
> {noformat}
> java.io.IOException: Cannot remove data directory:
> /export/home/hudson/hudson/jobs/Hadoop-Nightly/workspace/trunk/build/test/data/dfs/data
> at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:126)
> at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:80)
> at org.apache.hadoop.dfs.TestFsck.testFsckNonExistent(TestFsck.java:96)
> {noformat}
> MiniDFSCluster starts multiple DataNodes by calling DataNode::createDataNode,
> which creates and starts a DataNode thread, assigns the instance to a static
> member, and returns the Runnable. Of course, each call from MiniDFSCluster
> overwrites this instance. Since DataNode::shutdown() calls join() on the same
> Thread, each subsequent join is essentially a noop after the first DataNode
> finishes. When MiniDFSCluster::shutdown() returns, it may not have released
> its resources, so the next MiniDFSCluster may fail to start.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.