Sadly I am not writing it to multiple files.  I will be now. Do you have
a link on information to best practices in this regard?

The upside is, the jobs I was running were all expendable so I can
afford to lose what was written out.  Removing the edits file should
only impact data I was writing, correct?

Thank you Allen

-----Original Message-----
From: Allen Wittenauer [mailto:[email protected]] 
Sent: Monday, October 05, 2009 14:52
To: [email protected]; [email protected]
Subject: Re: Recovering Corrupt FS Image on Amazon EBS




On 10/5/09 11:41 AM, "Malcolm Matalka" <[email protected]>
wrote:
> In the event of an error, we bring all the instances down.  I then
tried
> to rerun the job (bringing all the instances back up and then
attaching
> to EBS volumes) and the namenode will not come up.  The logfile gives
> the error at the bottom.  What are my options here to recover the file
> system?

Your edits file is corrupt.   You have some choices:

A) if you ran a secondary and ran it frequently, hacking the edits off
at
the point of corruption will set the HDFS pretty close to the point of
last
run

B) If you didn't run the secondary that often or you don't make that
many
changes, you may just want to ignore the edits file and bring up the
HDFS
without it.

C) Check your other directory--you -are- writing fsimage and edits to
two
different dirs, right?  The other edits file may be healthier.

But I suspect you're looking at data loss. :(

> 2009-10-05 14:20:07,451 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode:
> java.lang.NumberFormatException: For input string: ""
> 
>         at
>
java.lang.NumberFormatException.forInputString(NumberFormatException.jav
> a:48)
> 
>         at java.lang.Integer.parseInt(Integer.java:468)
> 
>         at java.lang.Short.parseShort(Short.java:120)
> 
>         at java.lang.Short.parseShort(Short.java:78)
> 
>         at
>
org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav
> a:1261)


Reply via email to