Hi,

For some reason my secondary namenode process died 10 days ago and that has left me with an edits and edits.new files in my dfs/name/current directory. The fsimage file is also there but is old and does not have the merged changes from either the edits or the edits.new. The cluster has been running fine since the last startup which was 2 weeks ago.

Today i restarted the cluster and now the namenode complains with a NULL POINTER EXCEPTION. The last checkpoint saved is of the same size as the fsimage in the current directory so replacing it will not help.

This is a test cluster so worst case is i loose many changes that were not merged into the fsimage. I can remove the edits.new and bring the cluster up with a clean edits file. Will have to force the namenode out of safe mode but then running fsck complains that HDFS is corrupt, obviously missing blocks/files etc.

The question i have is if there is any way to salvage from such a situation? I read that one can maybe tamper with the edits and edits.new files to bring up the namenode but with minimum loss of data. This would require editing these files in a hex editor?

Is there any documentation/example maybe on how to do this or maybe it is not possible and not worth the effort. It would be good to know if there is a way out from such a situation.

I have a 3 node test cluster running Hadoop 0.20.2+737.

Appreciate if i can get any help/pointers.

Thanks,
Usman

--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Reply via email to