[jira] [Commented] (HDFS-1594) When the disk becomes full Namenode is getting shutdown and not able to recover

Eli Collins (JIRA) Thu, 21 Apr 2011 12:35:49 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022883#comment-13022883
 ]


Eli Collins commented on HDFS-1594:
-----------------------------------

bq. I disagree. I think losing one of the (supposedly redundant) volumes is 
sufficient cause for alarm as to warrant the whole thing being put into SM.

The NN can tolerate the failure of an edit log volume and continue operation, 
why is running out of disk space warrant a differetn outcome?  You could make 
the case that running out of disk space is a correlated failure (is likely to 
affect the other disks equally if they have similar capacities), but it may not 
be (eg the network mount may run out early or have more free space) in which 
case we'd be less available than we otherwise could me. However I think what 
you have make sense for now. Afterall an admin can always prevent the NN from 
entering SM by monitoring and ensuring all volumes have sufficient free space.

bq. I'm in favor of the current implementation - a single configurable 
threshold, which applies to each volume separately.

That sounds good to me. Since the threshold is a factor of how much slack the 
NN needs (ie is independent of the volume size) a single value enforeced across 
all volumes makes sense.

I'll check out your latest patch. In the mean time note that the TestStartup 
and TestEditLog failures are due to the patch (the NN is going into SM, looks 
like the threshold is getting crossed).

> When the disk becomes full Namenode is getting shutdown and not able to 
> recover
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-1594
>                 URL: https://issues.apache.org/jira/browse/HDFS-1594
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.21.0, 0.21.1, 0.22.0
>         Environment: Linux linux124 2.6.27.19-5-default #1 SMP 2009-02-28 
> 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Devaraj K
>            Assignee: Aaron T. Myers
>             Fix For: 0.23.0
>
>         Attachments: HDFS-1594.patch, HDFS-1594.patch, HDFS-1594.patch, 
> hadoop-root-namenode-linux124.log, hdfs-1594.0.patch, hdfs-1594.1.patch, 
> hdfs-1594.2.patch, hdfs-1594.3.patch, hdfs-1594.4.patch
>
>
> When the disk becomes full name node is shutting down and if we try to start 
> after making the space available It is not starting and throwing the below 
> exception.
> {code:xml} 
> 2011-01-24 23:23:33,727 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.
> java.io.EOFException
>       at java.io.DataInputStream.readFully(DataInputStream.java:180)
>       at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,729 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
>       at java.io.DataInputStream.readFully(DataInputStream.java:180)
>       at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,730 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG: 
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at linux124/10.18.52.124
> ************************************************************/
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1594) When the disk becomes full Namenode is getting shutdown and not able to recover

Reply via email to