On 10/5/09 11:57 AM, "Malcolm Matalka" <[email protected]> wrote:
> Sadly I am not writing it to multiple files. I will be now. Do you have > a link on information to best practices in this regard? I know there are some references in my "Hadoop 24/7" Apachecon presentation from last year. Does that count? ;) http://wiki.apache.org/hadoop/NameNode is probably the best link on NameNode configuration. We should probably set up a real "best practices" link rather than having info scattered around the site. > The upside is, the jobs I was running were all expendable so I can > afford to lose what was written out. Removing the edits file should > only impact data I was writing, correct? Any sort of changes, not just the data you were writing. [So permissions changes, etc.] > > Thank you Allen No problem. Good luck! :) > > -----Original Message----- > From: Allen Wittenauer [mailto:[email protected]] > Sent: Monday, October 05, 2009 14:52 > To: [email protected]; [email protected] > Subject: Re: Recovering Corrupt FS Image on Amazon EBS > > > > > On 10/5/09 11:41 AM, "Malcolm Matalka" <[email protected]> > wrote: >> In the event of an error, we bring all the instances down. I then > tried >> to rerun the job (bringing all the instances back up and then > attaching >> to EBS volumes) and the namenode will not come up. The logfile gives >> the error at the bottom. What are my options here to recover the file >> system? > > Your edits file is corrupt. You have some choices: > > A) if you ran a secondary and ran it frequently, hacking the edits off > at > the point of corruption will set the HDFS pretty close to the point of > last > run > > B) If you didn't run the secondary that often or you don't make that > many > changes, you may just want to ignore the edits file and bring up the > HDFS > without it. > > C) Check your other directory--you -are- writing fsimage and edits to > two > different dirs, right? The other edits file may be healthier. > > But I suspect you're looking at data loss. :( > >> 2009-10-05 14:20:07,451 ERROR >> org.apache.hadoop.hdfs.server.namenode.NameNode: >> java.lang.NumberFormatException: For input string: "" >> >> at >> > java.lang.NumberFormatException.forInputString(NumberFormatException.jav >> a:48) >> >> at java.lang.Integer.parseInt(Integer.java:468) >> >> at java.lang.Short.parseShort(Short.java:120) >> >> at java.lang.Short.parseShort(Short.java:78) >> >> at >> > org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav >> a:1261) > >
