[
https://issues.apache.org/jira/browse/HDFS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022883#comment-13022883
]
Eli Collins commented on HDFS-1594:
-----------------------------------
bq. I disagree. I think losing one of the (supposedly redundant) volumes is
sufficient cause for alarm as to warrant the whole thing being put into SM.
The NN can tolerate the failure of an edit log volume and continue operation,
why is running out of disk space warrant a differetn outcome? You could make
the case that running out of disk space is a correlated failure (is likely to
affect the other disks equally if they have similar capacities), but it may not
be (eg the network mount may run out early or have more free space) in which
case we'd be less available than we otherwise could me. However I think what
you have make sense for now. Afterall an admin can always prevent the NN from
entering SM by monitoring and ensuring all volumes have sufficient free space.
bq. I'm in favor of the current implementation - a single configurable
threshold, which applies to each volume separately.
That sounds good to me. Since the threshold is a factor of how much slack the
NN needs (ie is independent of the volume size) a single value enforeced across
all volumes makes sense.
I'll check out your latest patch. In the mean time note that the TestStartup
and TestEditLog failures are due to the patch (the NN is going into SM, looks
like the threshold is getting crossed).
> When the disk becomes full Namenode is getting shutdown and not able to
> recover
> -------------------------------------------------------------------------------
>
> Key: HDFS-1594
> URL: https://issues.apache.org/jira/browse/HDFS-1594
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.21.0, 0.21.1, 0.22.0
> Environment: Linux linux124 2.6.27.19-5-default #1 SMP 2009-02-28
> 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux
> Reporter: Devaraj K
> Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: HDFS-1594.patch, HDFS-1594.patch, HDFS-1594.patch,
> hadoop-root-namenode-linux124.log, hdfs-1594.0.patch, hdfs-1594.1.patch,
> hdfs-1594.2.patch, hdfs-1594.3.patch, hdfs-1594.4.patch
>
>
> When the disk becomes full name node is shutting down and if we try to start
> after making the space available It is not starting and throwing the below
> exception.
> {code:xml}
> 2011-01-24 23:23:33,727 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> initialization failed.
> java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,729 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,730 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at linux124/10.18.52.124
> ************************************************************/
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira