[ 
https://issues.apache.org/jira/browse/HDFS-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172025#comment-13172025
 ] 

Todd Lipcon commented on HDFS-2702:
-----------------------------------

- in {{fatalExit}}, can you change it to:
{code}
FSNamesystem.LOG.faral(msg, new Exception(msg));
{code}
so that we get a stacktrace in the logs?

- in {{exitIfNoStreams}} use {{isEmpty}} instead of comparing {{size() == 0}}
- rather than an {{if...throw AssertionError}} maybe just use the 
{{Preconditions.checkState}} function from guava? Or is guava not in branch-1 
yet? (can't remember)
- instead of calling {{exitIfNoStreams}} everywhere, maybe 
{{removeEditsForStorageDir}} can just call it whenever it removes one?

Otherwise looks good.
                
> A single failed name dir can cause the NN to exit 
> --------------------------------------------------
>
>                 Key: HDFS-2702
>                 URL: https://issues.apache.org/jira/browse/HDFS-2702
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Eli Collins
>            Assignee: Eli Collins
>            Priority: Critical
>         Attachments: hdfs-2702.txt, hdfs-2702.txt, hdfs-2702.txt
>
>
> There's a bug in FSEditLog#rollEditLog which results in the NN process 
> exiting if a single name dir has failed. Here's the relevant code:
> {code}
> close()  // So editStreams.size() is 0 
> foreach edits dir {
>   ..
>   eStream = new ...  // Might get an IOE here
>   editStreams.add(eStream);
> } catch (IOException ioe) {
>   removeEditsForStorageDir(sd);  // exits if editStreams.size() <= 1  
> }
> {code}
> If we get an IOException before we've added two edits streams to the list 
> we'll exit, eg if there's an error processing the 1st name dir we'll exit 
> even if there are 4 valid name dirs. The fix is to move the checking out of 
> removeEditsForStorageDir (nee processIOError) or modify it so it can be 
> disabled in some cases, eg here where we don't yet know how many streams are 
> valid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to