[ 
https://issues.apache.org/jira/browse/HDFS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-1594:
---------------------------------

    Attachment: hdfs-1594.5.patch

bq. In the mean time note that the TestStartup and TestEditLog failures are due 
to the patch (the NN is going into SM, looks like the threshold is getting 
crossed).

The tests were failing because of a bug, but not quite that one. The previous 
implementation waited for the NN resource monitoring thread to run before 
checking for available resources. In this patch, I've changed things around 
slightly so that an initial resource check is done at FSNamesystem 
initialization time.

bq. Could you fold the call to safeMode.setResourcesLow() on line 2891 into 
enterSafeMode? Ie I think enterSafeMode(true) should always result in a call to 
setResourcesLow.

Good point. Done.

bq. Perhaps include "resouce" in the "dfs.nn.du.reserved" and 
"dfs.nn.checked.volumes" key names so it's clear there's a relationship between 
them and "dfs.nn.resource.check.interval".

Changed to "{{dfs.namenode.resource.du.reserved}}" and 
"{{dfs.namenode.resource.checked.volumes}}".

bq. In the NameNodeResourceChecker class header comment what do you mean by 
"heap space available on all volumes"?

Good catch. This was a relic from the original version of the patch posted by 
Devaraj. I removed this functionality, but missed the comment. Fixed.

bq. Would it be hard to write a test that crosses the threshold, eg set the 
limit based on current available space minus say 500KB then create a large file 
and assert the NN went into SM?

Nope. Done.

{quote}
Nits:

FSNameSystem line 570: missing space after "if", and extra space after 
"interrupt". line 4058 has extra parens.
isResourcesLow -> areResourcesLow or just resourcesLow
NameNodeResourceChecker line 118, <= should technically be <, or the message 
should say something like the available space has reached the reserved amount.
In the test extra newline on line 68 ("in case of errors")
{quote}

Fixed.

> When the disk becomes full Namenode is getting shutdown and not able to 
> recover
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-1594
>                 URL: https://issues.apache.org/jira/browse/HDFS-1594
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.21.0, 0.21.1, 0.22.0
>         Environment: Linux linux124 2.6.27.19-5-default #1 SMP 2009-02-28 
> 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Devaraj K
>            Assignee: Aaron T. Myers
>             Fix For: 0.23.0
>
>         Attachments: HDFS-1594.patch, HDFS-1594.patch, HDFS-1594.patch, 
> hadoop-root-namenode-linux124.log, hdfs-1594.0.patch, hdfs-1594.1.patch, 
> hdfs-1594.2.patch, hdfs-1594.3.patch, hdfs-1594.4.patch, hdfs-1594.5.patch
>
>
> When the disk becomes full name node is shutting down and if we try to start 
> after making the space available It is not starting and throwing the below 
> exception.
> {code:xml} 
> 2011-01-24 23:23:33,727 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.
> java.io.EOFException
>       at java.io.DataInputStream.readFully(DataInputStream.java:180)
>       at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,729 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
>       at java.io.DataInputStream.readFully(DataInputStream.java:180)
>       at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:284)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:577)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:570)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
> 2011-01-24 23:23:33,730 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG: 
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at linux124/10.18.52.124
> ************************************************************/
> {code} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to