Ok, googling a little bit around, the solution seems to either delete the edits file, which in my case would be non-cool (24MB worth of edits in there), or truncate it correctly.
So I used the following script to figure out how much data needs to be
dropped:
LEN=25497570
while true
do
dd if=edits.org of=edits bs=$LEN count=1
time hadoop namenode
if [[ $? -ne 255 ]]
then
echo $LEN seems to have worked.
exit 0
fi
LEN=$(expr $LEN - 1)
done
Guess something like this might make sense to add
http://wiki.apache.org/hadoop/TroubleShooting
not everyone will be able to figure out how to get rid of the "last"
incomplete record.
Another idea would be a tool or namenode startup mode that would make it
ignore EOFExceptions to recover as much of the edits as possible.
Andreas
On Friday 05 September 2008 13:30:34 Andreas Kostyrka wrote:
> Hi!
>
> My namenode has run out of space, and now I'm getting the following:
>
> 08/09/05 09:23:22 WARN dfs.StateChange: DIR* FSDirectory.unprotectedDelete:
> failed to
> remove /data_v1/2008/06/26/12/pub1-access-2008-06-26-11_52_07.log.gz
> because it does not exist
> 08/09/05 09:23:22 INFO ipc.Server: Stopping server on 9000
> 08/09/05 09:23:22 ERROR dfs.NameNode: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
> at
> org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
> at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:441)
> at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766)
> at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640)
> at
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:223)
> at
> org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80) at
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274)
> at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:255)
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133)
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:178)
> at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:164)
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)
>
> 08/09/05 09:23:22 INFO dfs.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at
> ec2-67-202-42-251.compute-1.amazonaws.com/10.251.39.196
>
> hadoop-0.17.1 btw.
>
> What do I do now?
>
> Andreas
signature.asc
Description: This is a digitally signed message part.
