You should always have more than one location (preferably on different disks) for fsimage and editslog.

A few months back I had a proposal to keep checksums for each record on fsimage and editslog and NameNode would recover transparently from such corruptions when there are more than one copies available. It didn't come up in priority since there were no such failures observed.

You should certainly report these cases and will help the feature gain more traction.

Raghu.

Torsten Curdt wrote:
Just a bit of a feedback here.

One of our hadoop 0.16.4 namenodes had gotten a disk full incident today. No second backup namenode was in place. Both files fsimage and edits seem to have gotten corrupted. After quite a bit of debugging and fiddling with a hex edtor we managed to resurrect the files and continue with just minor loss.

Thankfully this only happened on a development cluster - not on production. But shouldn't that be something that should NEVER happen?

cheers
--
Torsten

Reply via email to