Re: Failed namenode restart, recovering from corrupt edits file?

Friso van Vollenhoven Wed, 12 Jan 2011 13:05:41 -0800

Hi Adam,

We have probably had the same problem on CDH3b3. Running two secondary NNs 
corrupts the edits.new, although it should not give any trouble. Everything 
runs fine as long as it stays up, but restarting the NN will not work because 
of the corruption. We have reproduced this once more to verify. With only one 
secondary NN running, restarting works fine (also after a couple of days of 
operation).


If I am correct your proposed solution would set you back to a image from about 
15-30 minutes before the crash. I think it depends on what you do with your 
HDFS (HBase, append only things, ?), whether that will work out. In our case we 
are running HBase and going back in time with the NN image is not very helpful 
then, because of splits and compactions removing and adding files all the time. 
On append only workloads where you have the option of redoing whatever it is 
that you did just before the time of the crash, this could work. But, please 
verify with someone with a better understanding of HDFS internals.

Also, there apparently is a way of healing a corrupt edits file using your 
favorite hex editor. There is a thread here: 
http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/%3caanlktinbhmn1x8dlir-c4ibhja9nh46tns588cqcn...@mail.gmail.com%3e<http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201010.mbox/<aanlktinbhmn1x8dlir-c4ibhja9nh46tns588cqcn...@mail.gmail.com>>

There is a thread about this (our) problem on the cdh-user Google group. You 
could also try to post there.


Friso



On 12 jan 2011, at 20:14, Adam Phelps wrote:

An update on this - around the time the secondary namenode crashed we had been 
setting up the config to run a secondary namenode process on a separate machine 
from namenode.  Abut 30 minutes before the crash we had added the new node to 
the conf/masters list and started the secondarynamenode on that node.  We have 
the checkpoint period as 15 minutes, so it looks like one period was processed 
and then the original secondary failed at the next period.

The documentation appears to indicate that having multiple secondary's is fine, 
but the timing here seems to indicate otherwise.

To recover here would it simply be best to switch edits.new for edits and then 
attempt to start the namenode?

Thanks
- Adam

On 1/12/11 10:43 AM, Adam Phelps wrote:
We were restarting the namenode and datanode processes on our cluster
(due to changing some configuration options), however the namenode
failed to restart with the error I've pasted below. (If it matters we're
running the CDH3B3 release)

>From the looks of it the files causing the problem were the output of a
MR job that was being run and I guess the job's infrastructure was
renaming them from temporary output to the final output location.

We scanned through the hadoop logs and discovered that the secondary
namenode process had died the previous day with that same error (copied
that as well).

In the namenode's metadata directory, we see this:

./current:
total 546964
-rw-r--r-- 1 hdfs hdfs 477954043 2011-01-12 16:07 edits.new
-rw-r--r-- 1 hdfs hdfs 2865644 2011-01-11 00:46 edits
-rw-r--r-- 1 hdfs hdfs 8 2011-01-11 00:32 fstime
-rw-r--r-- 1 hdfs hdfs 100 2011-01-11 00:32 VERSION
-rw-r--r-- 1 hdfs hdfs 79595687 2011-01-11 00:32 fsimage

./previous.checkpoint:
total 77116
-rw-r--r-- 1 hdfs hdfs 1051329 2011-01-06 23:47 edits
-rw-r--r-- 1 hdfs hdfs 8 2011-01-06 23:40 fstime
-rw-r--r-- 1 hdfs hdfs 100 2011-01-06 23:40 VERSION
-rw-r--r-- 1 hdfs hdfs 78913323 2011-01-06 23:40 fsimage

The current/edits file contains references to the missing files, so its
my assumption that there was some error which caused those files to be
deleted (we're still investigating this) leading to the failure of the
secondary namenode. So the secondary namenode process never rolled those
edits into fsimage, and the namenode then started writing to edits.new.

Does anyone know how we'd go about getting the namenode running again?
We're fine with discarding those specific files, but would rather not
have to revert to the earlier image since its now quite out of date (and
I imagine could cause problems with other things we had running such as
HBase).

Thanks
- Adam


Namenode error on restart:
2011-01-12 16:51:30,571 WARN org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.unprotectedRenameTo: failed to rename
XXX/_temporary/_attempt_201101062358_2180_m_000291_0/part-m-00291.lzo to
XXX/part-m-00291.lzo because source does not exist
2011-01-12 16:51:30,571 WARN org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.unprotectedRenameTo: failed to rename
XXX/_temporary/_attempt_201101062358_2180_m_000327_0/part-m-00327.lzo to
XXX/part-m-00327.lzo because source does not exist
2011-01-12 16:51:30,572 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088)

at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1100)

at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1003)

at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:206)

at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:637)

at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1034)

at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845)

at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379)

at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)

at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343)

at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:317)

at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:214)

at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:394)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1148)

at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1157)

Secondary namenode error:
2011-01-11 00:46:35,935 WARN org.apache.hadoop.hdfs.StateChange: DIR*
FSDirectory.unprotectedRenameTo: failed to rename XXX/_temporary/_attem
pt_201101062358_2180_m_000327_0/part-m-00327.lzo to XXX/part-m-00327.lzo
because source does not exist
2011-01-11 00:46:35,936 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Throwable
Exception in doCheckpoint:
2011-01-11 00:46:35,936 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088)

at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1100)

at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1003)

at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:206)

at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:637)

at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1034)

at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:678)

at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$500(SecondaryNameNode.java:577)

at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:454)

at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:418)

at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:313)

at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:276)

at java.lang.Thread.run(Thread.java:662)



- Adam

Re: Failed namenode restart, recovering from corrupt edits file?

Reply via email to