[ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2702:
------------------------------

    Attachment: hdfs-2702.txt

#1-2 Good suggestions. Done.
#3 Yea, no guava in branch-1.
#4 That's what causes the bug (on log roll when we close all and add streams 
we'll exit the NN when we don't mean to). I considered folding the check into 
the removeEdits and making it conditional (so we don't trigger it in log roll) 
but there are a number of places where we want to trigger this check where we 
are not removing edits so it seemed cleaner to always call it explicitly.
                
> A single failed name dir can cause the NN to exit 
> --------------------------------------------------
>
>                 Key: HDFS-2702
>                 URL: https://issues.apache.org/jira/browse/HDFS-2702
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>            Priority: Critical
>         Attachments: hdfs-2702.txt, hdfs-2702.txt, hdfs-2702.txt, 
> hdfs-2702.txt
>
>
> There's a bug in FSEditLog#rollEditLog which results in the NN process 
> exiting if a single name dir has failed. Here's the relevant code:
> {code}
> close()  // So editStreams.size() is 0 
> foreach edits dir {
>   ..
>   eStream = new ...  // Might get an IOE here
>   editStreams.add(eStream);
> } catch (IOException ioe) {
>   removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
> }
> {code}
> If we get an IOException before we've added two edits streams to the list 
> we'll exit, eg if there's an error processing the 1st name dir we'll exit 
> even if there are 4 valid name dirs. The fix is to move the checking out of 
> removeEditsForStorageDir (nee processIOError) or modify it so it can be 
> disabled in some cases, eg here where we don't yet know how many streams are 
> valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to