Hi Matt, If you want to keep your recent edits, you'll have to place an 0xFF at the beginning of the most recent edit entry in the edit log. It's a bit tough to find these boundaries, but you can try applying this patch and rebuilding:
https://issues.apache.org/jira/browse/hdfs-1378 This will tell you the offset of the broken entry ("recent opcodes") and you can put an 0xff there to tie off the file before the corrupt entry. -Todd On Tue, Oct 5, 2010 at 8:16 AM, Matthew LeMieux <m...@mlogiciels.com> wrote: > The namenode on an otherwise very stable HDFS cluster crashed recently. > The filesystem filled up on the name node, which I assume is what caused > the crash. The problem has been fixed, but I cannot get the namenode to > restart. I am using version CDH3b2 (hadoop-0.20.2+320). > > The error is this: > > 2010-10-05 14:46:55,989 INFO org.apache.hadoop.hdfs.server.common.Storage: > Edits file /mnt/name/current/edits of size 157037 edits # 969 loaded in 0 > seconds. > 2010-10-05 14:46:55,992 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: > java.lang.NumberFormatException: For input string: > "128...@^@^...@^@^...@^@^...@^@" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) > at java.lang.Long.parseLong(Long.java:419) > at java.lang.Long.parseLong(Long.java:468) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1355) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:563) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1022) > ... > > This page (http://wiki.apache.org/hadoop/TroubleShooting) recommends > editing the edits file with a hex editor, but does not explain where the > record boundaries are. It is a different exception, but seemed like a > similar cause, the edits file. I tried removing a line at a time, but the > error continues, only with a smaller size and edits #: > > 2010-10-05 14:37:16,635 INFO org.apache.hadoop.hdfs.server.common.Storage: > Edits file /mnt/name/current/edits of size 156663 edits # 966 loaded in 0 > seconds. > 2010-10-05 14:37:16,638 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: > java.lang.NumberFormatException: For input string: > "128...@^@^...@^@^...@^@^...@^@" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) > at java.lang.Long.parseLong(Long.java:419) > at java.lang.Long.parseLong(Long.java:468) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1355) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:563) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1022) > ... > > I tried removing the edits file altogether, but that failed > with: java.io.IOException: Edits file is not found > > I tried with a zero length edits file, so it would at least have a file > there, but that results in an NPE: > > 2010-10-05 14:52:34,775 INFO org.apache.hadoop.hdfs.server.common.Storage: > Edits file /mnt/name/current/edits of size 0 edits # 0 loaded in 0 seconds. > 2010-10-05 14:52:34,776 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1081) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1093) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:996) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:199) > > > Most if not all the files I noticed in the edits file are temporary files > that will be deleted once this thing gets back up and running anyway. > There is a closed ticket that might be related: > https://issues.apache.org/jira/browse/HDFS-686 , but the version I'm > using seems to already have HDFS-686 (according to > http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320/changes.html) > > What do I have to do to get back up and running? > > Thank you for your help, > > Matthew > > > -- Todd Lipcon Software Engineer, Cloudera