[
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Foley updated HDFS-2702:
-----------------------------
Fix Version/s: (was: 1.1.0)
> A single failed name dir can cause the NN to exit
> --------------------------------------------------
>
> Key: HDFS-2702
> URL: https://issues.apache.org/jira/browse/HDFS-2702
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 1.0.0
> Reporter: Eli Collins
> Assignee: Eli Collins
> Priority: Critical
> Fix For: 1.0.2
>
> Attachments: hdfs-2702.txt, hdfs-2702.txt, hdfs-2702.txt,
> hdfs-2702.txt, hdfs-2702.txt
>
>
> There's a bug in FSEditLog#rollEditLog which results in the NN process
> exiting if a single name dir has failed. Here's the relevant code:
> {code}
> close() // So editStreams.size() is 0
> foreach edits dir {
> ..
> eStream = new ... // Might get an IOE here
> editStreams.add(eStream);
> } catch (IOException ioe) {
> removeEditsForStorageDir(sd); // exits if editStreams.size() <= 1
> }
> {code}
> If we get an IOException before we've added two edits streams to the list
> we'll exit, eg if there's an error processing the 1st name dir we'll exit
> even if there are 4 valid name dirs. The fix is to move the checking out of
> removeEditsForStorageDir (nee processIOError) or modify it so it can be
> disabled in some cases, eg here where we don't yet know how many streams are
> valid.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira