[
https://issues.apache.org/jira/browse/HDFS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aaron T. Myers updated HDFS-1594:
---------------------------------
Attachment: hdfs-1594.5.patch
bq. In the mean time note that the TestStartup and TestEditLog failures are due
to the patch (the NN is going into SM, looks like the threshold is getting
crossed).
The tests were failing because of a bug, but not quite that one. The previous
implementation waited for the NN resource monitoring thread to run before
checking for available resources. In this patch, I've changed things around
slightly so that an initial resource check is done at FSNamesystem
initialization time.
bq. Could you fold the call to safeMode.setResourcesLow() on line 2891 into
enterSafeMode? Ie I think enterSafeMode(true) should always result in a call to
setResourcesLow.
Good point. Done.
bq. Perhaps include "resouce" in the "dfs.nn.du.reserved" and
"dfs.nn.checked.volumes" key names so it's clear there's a relationship between
them and "dfs.nn.resource.check.interval".
Changed to "{{dfs.namenode.resource.du.reserved}}" and
"{{dfs.namenode.resource.checked.volumes}}".
bq. In the NameNodeResourceChecker class header comment what do you mean by
"heap space available on all volumes"?
Good catch. This was a relic from the original version of the patch posted by
Devaraj. I removed this functionality, but missed the comment. Fixed.
bq. Would it be hard to write a test that crosses the threshold, eg set the
limit based on current available space minus say 500KB then create a large file
and assert the NN went into SM?
Nope. Done.
{quote}
Nits:
FSNameSystem line 570: missing space after "if", and extra space after
"interrupt". line 4058 has extra parens.
isResourcesLow -> areResourcesLow or just resourcesLow
NameNodeResourceChecker line 118, <= should technically be <, or the message
should say something like the available space has reached the reserved amount.
In the test extra newline on line 68 ("in case of errors")
{quote}
Fixed.
> When the disk becomes full Namenode is getting shutdown and not able to
> recover
> -------------------------------------------------------------------------------
>
> Key: HDFS-1594
> URL: https://issues.apache.org/jira/browse/HDFS-1594
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.21.0, 0.21.1, 0.22.0
> Environment: Linux linux124 2.6.27.19-5-default #1 SMP 2009-02-28
> 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux
> Reporter: Devaraj K
> Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: HDFS-1594.patch, HDFS-1594.patch, HDFS-1594.patch,
> hadoop-root-namenode-linux124.log, hdfs-1594.0.patch, hdfs-1594.1.patch,
> hdfs-1594.2.patch, hdfs-1594.3.patch, hdfs-1594.4.patch, hdfs-1594.5.patch
>
>
> When the disk becomes full name node is shutting down and if we try to start
> after making the space available It is not starting and throwing the below
> exception.
> {code:xml}
> 2011-01-24 23:23:33,727 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> initialization failed.
> java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,729 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,730 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at linux124/10.18.52.124
> ************************************************************/
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira