A single failed name dir can cause the NN to exit 
--------------------------------------------------

                 Key: HDFS-2702
                 URL: https://issues.apache.org/jira/browse/HDFS-2702
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 0.20.205.0
            Reporter: Eli Collins
            Assignee: Eli Collins
            Priority: Critical
             Fix For: 1.1.0


There's a bug in FSEditLog#rollEditLog which results in the NN process exiting 
if a single name dir has failed. Here's the relevant code:

{code}
close()  // So editStreams.size() is 0 
foreach edits dir {
  ..
  eStream = new ...  // Might get an IOE here
  editStreams.add(eStream);
} catch (IOException ioe) {
  removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
}
{code}

If we get an IOException before we've added two edits streams to the list we'll 
exit, eg if there's an error processing the 1st name dir we'll exit even if 
there are 4 valid name dirs. The fix is to move the checking out of 
removeEditsForStorageDir (nee processIOError) or modify it so it can be 
disabled in some cases, eg here where we don't yet know how many streams are 
valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to