I had the same thing happen to me a few weeks ago. The solution was to modify one of the classes a bit (FSEdits.java or some such) and simple catch + swallow one of the exceptions. This let the NN come up again (at the expense of some data loss). Lohit helped me out and files a bug. Don't have the issue number handy, but it is in JIRA and still open as of a few days ago. NN HA seems to be a requirement for a lot of people... I suppose because it's (the only?) SPOF. :)
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Torsten Curdt <[EMAIL PROTECTED]> > To: [email protected] > Sent: Wednesday, July 30, 2008 2:09:15 PM > Subject: corrupted fsimage and edits > > Just a bit of a feedback here. > > One of our hadoop 0.16.4 namenodes had gotten a disk full incident > today. No second backup namenode was in place. Both files fsimage and > edits seem to have gotten corrupted. After quite a bit of debugging > and fiddling with a hex edtor we managed to resurrect the files and > continue with just minor loss. > > Thankfully this only happened on a development cluster - not on > production. But shouldn't that be something that should NEVER happen? > > cheers > -- > Torsten
