Hi Matt,

If you want to keep your recent edits, you'll have to place an 0xFF at the
beginning of the most recent edit entry in the edit log. It's a bit tough to
find these boundaries, but you can try applying this patch and rebuilding:

https://issues.apache.org/jira/browse/hdfs-1378

This will tell you the offset of the broken entry ("recent opcodes") and you
can put an 0xff there to tie off the file before the corrupt entry.

-Todd


On Tue, Oct 5, 2010 at 8:16 AM, Matthew LeMieux <m...@mlogiciels.com> wrote:

> The namenode on an otherwise very stable HDFS cluster crashed recently.
>  The filesystem filled up on the name node, which I assume is what caused
> the crash.    The problem has been fixed, but I cannot get the namenode to
> restart.  I am using version CDH3b2  (hadoop-0.20.2+320).
>
> The error is this:
>
> 2010-10-05 14:46:55,989 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /mnt/name/current/edits of size 157037 edits # 969 loaded in 0
> seconds.
> 2010-10-05 14:46:55,992 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode:
> java.lang.NumberFormatException: For input string: 
> "128...@^@^...@^@^...@^@^...@^@"
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>         at java.lang.Long.parseLong(Long.java:419)
>         at java.lang.Long.parseLong(Long.java:468)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1355)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:563)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1022)
>         ...
>
> This page (http://wiki.apache.org/hadoop/TroubleShooting) recommends
> editing the edits file with a hex editor, but does not explain where the
> record boundaries are.  It is a different exception, but seemed like a
> similar cause, the edits file.  I tried removing a line at a time, but the
> error continues, only with a smaller size and edits #:
>
> 2010-10-05 14:37:16,635 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /mnt/name/current/edits of size 156663 edits # 966 loaded in 0
> seconds.
> 2010-10-05 14:37:16,638 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode:
> java.lang.NumberFormatException: For input string: 
> "128...@^@^...@^@^...@^@^...@^@"
>         at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>         at java.lang.Long.parseLong(Long.java:419)
>         at java.lang.Long.parseLong(Long.java:468)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1355)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:563)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1022)
>         ...
>
> I tried removing the edits file altogether, but that failed
> with: java.io.IOException: Edits file is not found
>
> I tried with a zero length edits file, so it would at least have a file
> there, but that results in an NPE:
>
> 2010-10-05 14:52:34,775 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /mnt/name/current/edits of size 0 edits # 0 loaded in 0 seconds.
> 2010-10-05 14:52:34,776 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode:
> java.lang.NullPointerException
>         at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1081)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1093)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:996)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:199)
>
>
> Most if not all the files I noticed in the edits file are temporary files
> that will be deleted once this thing gets back up and running anyway.
>  There is a closed ticket that might be related:
> https://issues.apache.org/jira/browse/HDFS-686 ,  but the version I'm
> using seems to already have HDFS-686 (according to
> http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320/changes.html)
>
> What do I have to do to get back up and running?
>
> Thank you for your help,
>
> Matthew
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to