On 10/5/09 11:57 AM, "Malcolm Matalka" <[email protected]> wrote:

> Sadly I am not writing it to multiple files.  I will be now. Do you have
> a link on information to best practices in this regard?

I know there are some references in my "Hadoop 24/7" Apachecon presentation
from last year.  Does that count? ;)

http://wiki.apache.org/hadoop/NameNode  is probably the best link on
NameNode configuration.  We should probably set up a real "best practices"
link rather than having info scattered around the site.

> The upside is, the jobs I was running were all expendable so I can
> afford to lose what was written out.  Removing the edits file should
> only impact data I was writing, correct?

Any sort of changes, not just the data you were writing. [So permissions
changes, etc.]

> 
> Thank you Allen

No problem.  Good luck! :)

> 
> -----Original Message-----
> From: Allen Wittenauer [mailto:[email protected]]
> Sent: Monday, October 05, 2009 14:52
> To: [email protected]; [email protected]
> Subject: Re: Recovering Corrupt FS Image on Amazon EBS
> 
> 
> 
> 
> On 10/5/09 11:41 AM, "Malcolm Matalka" <[email protected]>
> wrote:
>> In the event of an error, we bring all the instances down.  I then
> tried
>> to rerun the job (bringing all the instances back up and then
> attaching
>> to EBS volumes) and the namenode will not come up.  The logfile gives
>> the error at the bottom.  What are my options here to recover the file
>> system?
> 
> Your edits file is corrupt.   You have some choices:
> 
> A) if you ran a secondary and ran it frequently, hacking the edits off
> at
> the point of corruption will set the HDFS pretty close to the point of
> last
> run
> 
> B) If you didn't run the secondary that often or you don't make that
> many
> changes, you may just want to ignore the edits file and bring up the
> HDFS
> without it.
> 
> C) Check your other directory--you -are- writing fsimage and edits to
> two
> different dirs, right?  The other edits file may be healthier.
> 
> But I suspect you're looking at data loss. :(
> 
>> 2009-10-05 14:20:07,451 ERROR
>> org.apache.hadoop.hdfs.server.namenode.NameNode:
>> java.lang.NumberFormatException: For input string: ""
>> 
>>         at
>> 
> java.lang.NumberFormatException.forInputString(NumberFormatException.jav
>> a:48)
>> 
>>         at java.lang.Integer.parseInt(Integer.java:468)
>> 
>>         at java.lang.Short.parseShort(Short.java:120)
>> 
>>         at java.lang.Short.parseShort(Short.java:78)
>> 
>>         at
>> 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav
>> a:1261)
> 
> 

Reply via email to