I had the same thing happen to me a few weeks ago.  The solution was to modify 
one of the classes a bit (FSEdits.java or some such) and simple catch + swallow 
one of the exceptions.  This let the NN come up again (at the expense of some 
data loss).  Lohit helped me out and files a bug.  Don't have the issue number 
handy, but it is in JIRA and still open as of a few days ago.  NN HA seems to 
be a requirement for a lot of people... I suppose because it's (the only?) 
SPOF. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Torsten Curdt <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, July 30, 2008 2:09:15 PM
> Subject: corrupted fsimage and edits
> 
> Just a bit of a feedback here.
> 
> One of our hadoop 0.16.4 namenodes had gotten a disk full incident  
> today. No second backup namenode was in place. Both files fsimage and  
> edits seem to have gotten corrupted. After quite a bit of debugging  
> and fiddling with a hex edtor we managed to resurrect the files and  
> continue with just minor loss.
> 
> Thankfully this only happened on a development cluster - not on  
> production. But shouldn't that be something that should NEVER happen?
> 
> cheers
> --
> Torsten

Reply via email to