[
https://issues.apache.org/jira/browse/HDFS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron T. Myers updated HDFS-1594:
---------------------------------
Attachment: hdfs-1594.0.patch
Modified the patch posted on 2/16 in the following ways:
* Changed disk amount configuration from "percentage used" to "bytes remaining".
* Separated configuration of minimum disk space reserved from configuration of
minimum heap remaining.
* Renamed NameNodeResourceBean to NameNodeResourceChecker.
* Added some log output.
* Some style clean up.
* Added more comments.
I have a few open questions which I'd love to get some feedback on:
# The original JIRA description indicated that this problem was caused only by
the disk filling up, yet the original patches also monitor for a near-full JVM
heap. I left the memory checking in this reworked patch, but I think this
feature should probably be removed. Unclear how this would interact with Java
GC, and unclear if entering safemode would actually help the situation.
# Todd mentioned that he's seen edit log corruptions from the log volume
filling up. Perhaps we should add an additional configuration option to let the
user specify arbitrary volumes to check, besides just the volumes containing
the edits/name dirs?
# I switched the configuration of disk space amount from a percentage to a
number of bytes remaining, since volume sizes may differ, and thus a fixed
amount of space reserved seems more appropriate. Perhaps there should be a way
to specify the threshold per-volume?
# I'm a little concerned that we might see a problem where the NN will reach
the threshold and then thrash in and out of safemode as it sits on the cusp of
the configured free space. Perhaps we should not automatically leave safemode
in the event the resources later return to normal? Or make this behavior
configurable? It seems to me that an NN volume running out of space should be a
cause for concern, so it might be reasonable for an admin to have to manually
force the NN out of safe mode.
I should also mention that I did some manual testing of this patch by setting
the reserve amount to a little less than the amount of space free on my hard
drive, and then creating a large file from /dev/urandom to observe the NN
entering/leaving safemode as the threshold was reached.
> When the disk becomes full Namenode is getting shutdown and not able to
> recover
> -------------------------------------------------------------------------------
>
> Key: HDFS-1594
> URL: https://issues.apache.org/jira/browse/HDFS-1594
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.21.0, 0.21.1, 0.22.0
> Environment: Linux linux124 2.6.27.19-5-default #1 SMP 2009-02-28
> 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux
> Reporter: Devaraj K
> Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: HDFS-1594.patch, HDFS-1594.patch, HDFS-1594.patch,
> hadoop-root-namenode-linux124.log, hdfs-1594.0.patch
>
>
> When the disk becomes full name node is shutting down and if we try to start
> after making the space available It is not starting and throwing the below
> exception.
> {code:xml}
> 2011-01-24 23:23:33,727 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> initialization failed.
> java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,729 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,730 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at linux124/10.18.52.124
> ************************************************************/
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira