[
https://issues.apache.org/jira/browse/HDFS-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028505#comment-13028505
]
Eli Collins commented on HDFS-1878:
-----------------------------------
Does this affect trunk?
> TestHDFSServerPorts unit test failure - race condition in
> FSNamesystem.close() causes NullPointerException without serious consequence
> --------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-1878
> URL: https://issues.apache.org/jira/browse/HDFS-1878
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.20.204.0
> Reporter: Matt Foley
> Assignee: Matt Foley
> Priority: Minor
> Fix For: 0.20.205.0
>
> Attachments: 1878-1.patch
>
>
> In 20.204, TestHDFSServerPorts was observed to intermittently throw a
> NullPointerException. This only happens when FSNamesystem.close() is called,
> which means system termination for the Namenode, so this is not a serious bug
> for .204. TestHDFSServerPorts is more likely than normal execution to
> stimulate the race, because it runs two Namenodes in the same JVM, causing
> more interleaving and more potential to see a race condition.
> The race is in FSNamesystem.close(), line 566, we have:
> if (replthread != null) replthread.interrupt();
> if (replmon != null) replmon = null;
> Since the interrupted replthread is not waited on, there is a potential race
> condition with replmon being nulled before replthread is dead, but replthread
> references replmon in computeDatanodeWork() where the NullPointerException
> occurs.
> The solution is either to wait on replthread or just don't null replmon. The
> latter is preferred, since none of the sibling Namenode processing threads
> are waited on in close().
> I'll attach a patch for .205.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira