Hello,

I am trying to recover a namenode that failed, maybe using the checkpoint
node.
When I start dfs, I get this in the logs (see end of email).
I think my metadata is corrupt. I also think this is because hadoop was
checkpointing and the machine shut down at the same time.
Note that this is a pseudo distributed installation.
Here is the content of namedir (see end of email)
I tried replacing the current fsimage by the checkpoint fsimage, remove
edits.new and have an empty edits file
and this way I get a working hdfs but it is too old.
Do you have any suggestions to recover the most recent fsimage, maybe by
fiddling with edits and edits.new ?

Thanks very much in advance,

Juan

------------------------------------- content of namedir

ls -l -R /scratch/namedir/
/scratch/namedir/:
total 12
drwxr-xr-x 2 hadoop hadoop 4096 2012-03-22 22:06 current
drwxr-xr-x 2 hadoop hadoop 4096 2012-03-20 16:18 image
drwxr-xr-x 2 hadoop hadoop 4096 2012-03-20 17:28 previous.checkpoint

/scratch/namedir/current:
total 2168
-rw-r--r-- 1 hadoop hadoop    6417 2012-03-20 19:28 edits
-rw-r--r-- 1 hadoop hadoop 2094127 2012-03-22 17:25 edits.new
-rw-r--r-- 1 hadoop hadoop  105538 2012-03-20 18:28 fsimage
-rw-r--r-- 1 hadoop hadoop       8 2012-03-22 22:06 fstime
-rw-r--r-- 1 hadoop hadoop     101 2012-03-20 18:28 VERSION

/scratch/namedir/image:
total 4
-rw-r--r-- 1 hadoop hadoop 157 2012-03-20 18:28 fsimage

/scratch/namedir/previous.checkpoint:
total 160
-rw-r--r-- 1 hadoop hadoop 85345 2012-03-20 18:28 edits
-rw-r--r-- 1 hadoop hadoop 67295 2012-03-20 17:28 fsimage
-rw-r--r-- 1 hadoop hadoop     8 2012-03-20 17:28 fstime
-rw-r--r-- 1 hadoop hadoop   101 2012-03-20 17:28 VERSION


------------------------------------- logs when starting dfs

hadoop-hadoop-secondarynamenode-mymachine.log

2012-03-23 10:58:50,617 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
2012-03-23 10:58:51,618 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:9000. Already tried 8 time(s).

hadoop-hadoop-secondarynamenode-mymachine.log

2012-03-23 10:59:19,434 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
2012-03-23 10:59:20,434 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:9000. Already tried 2 time(s).

hadoop-hadoop-namenode-mymachine.log

2012-03-23 10:58:40,988 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NullPointerException: Panic: parent does not exist
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1508)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1522)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1407)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:216)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:526)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1019)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:483)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:291)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:270)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:271)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:303)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:433)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:421)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1359)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)

2012-03-23 10:58:40,989 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at air099/127.0.1.1
************************************************************/

Reply via email to