Hi Arv,

It sounds like your edits log in dfs.name.dir is corrupted since one of its
records got cut off by the disk filling up. When trying to replay the edit
log, it tries to read the entirety of that record and hits the end of file
unexpectedly - hence the EOFException.

Your options at this point are:

a) If you have a second copy of dfs.name.dir, it should also have a second
"edits" file. If it's longer it's possible that that copy is not corrupted.
I'd back up both copies, then duplicate the longer edit log into both name
dirs and try to start the namenode.

b) If you were running a secondary namenode, you should have a checkpoint of
the fsimage from a few hours before the failure. You can recover the fsimage
from there. You'll lose some time period's worth of metadata edits, but you
should be able to get the FS running again.

c) Last ditch attempt is to attempt to truncate the edit log at the correct
offset such that you avoid the EOFException. To do this would probably
involve adding some logging statements to the FSEditLog replay so you can
see what the byte offset of the last record it's trying to read is, and then
truncating the edit log right before that offset. This is somewhat
complicated and I wouldn't attempt it unless you (a) really need the data
and (b) don't have any other option.

-Todd

On Mon, Jul 20, 2009 at 12:27 PM, Arv Mistry <[email protected]> wrote:

> Hi,
>
> I'm getting the following error in starting up the namenode.
>
> What happened was one of our disks filled up, we reclaimed the
> disk space and tried to restart the hadoop daemons but the name node
> is now not starting up.
>
> Does anybody have any clues how to recover from this? I've tried
> searching through the Jira reports but nothing obvious.
>
> Appreciate any input, thanks.
>
> Cheers Arv
>
> 2009-07-20 14:57:41,712 INFO org.apache.hadoop.dfs.NameNode:
> STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = qa-cs1/192.168.0.54
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.18.3-dev
> STARTUP_MSG:   build =  -r ; compiled by 'bamboo' on Mon Nov 10 15:58:40
> PST 2008
> ************************************************************/
> 2009-07-20 14:57:41,801 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=NameNode, port=9000
> 2009-07-20 14:57:41,805 INFO org.apache.hadoop.dfs.NameNode: Namenode up
> at: 192.168.0.54/192.168.0.54:9000
> 2009-07-20 <http://192.168.0.54/192.168.0.54:9000%0A2009-07-20>14:57:41,808 
> INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2009-07-20 14:57:41,816 INFO org.apache.hadoop.dfs.NameNodeMetrics:
> Initializing NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2009-07-20 14:57:41,869 INFO org.apache.hadoop.fs.FSNamesystem:
> fsOwner=hadoopadmin,hadoopadmin
> 2009-07-20 14:57:41,869 INFO org.apache.hadoop.fs.FSNamesystem:
> supergroup=supergroup
> 2009-07-20 14:57:41,869 INFO org.apache.hadoop.fs.FSNamesystem:
> isPermissionEnabled=true
> 2009-07-20 14:57:41,877 INFO org.apache.hadoop.dfs.FSNamesystemMetrics:
> Initializing FSNamesystemMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 2009-07-20 14:57:41,878 INFO org.apache.hadoop.fs.FSNamesystem:
> Registered FSNamesystemStatusMBean
> 2009-07-20 14:57:41,908 INFO org.apache.hadoop.dfs.Storage: Number of
> files = 1808
> 2009-07-20 14:57:42,153 INFO org.apache.hadoop.dfs.Storage: Number of
> files under construction = 1
> 2009-07-20 14:57:42,157 INFO org.apache.hadoop.dfs.Storage: Image file
> of size 256399 loaded in 0 seconds.
> 2009-07-20 14:57:42,167 ERROR
> org.apache.hadoop.dfs.LeaseManager:
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605290.data
> not found in lease.paths
> (=[/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605294.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605298.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605303.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605328.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605335.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605337.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605340.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605346.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605401.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605432.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_200
>  90720_180000_1248113605451.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605464.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605487.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605499.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605539.data])
> 2009-07-20 14:57:42,167 ERROR
> org.apache.hadoop.dfs.LeaseManager:
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605294.data
> not found in lease.paths
> (=[/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605298.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605303.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605328.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605335.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605337.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605340.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605346.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605401.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605432.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605451.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_200
>  90720_180000_1248113605464.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605487.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605499.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605539.data])
> 2009-07-20 14:57:42,169 ERROR
> org.apache.hadoop.dfs.LeaseManager:
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605298.data
> not found in lease.paths
> (=[/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605303.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605328.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605335.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605337.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605340.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605346.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605401.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605290.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605294.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605432.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_200
>  90720_180000_1248113605451.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605464.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605487.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605499.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605539.data])
> 2009-07-20 14:57:42,169 ERROR
> org.apache.hadoop.dfs.LeaseManager:
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605303.data
> not found in lease.paths
> (=[/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605328.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605335.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605337.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605340.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605346.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605401.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605290.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605294.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605432.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605451.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_200
>  90720_180000_1248113605464.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605487.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605499.data,
> /opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605539.data])
> 2009-07-20 14:57:42,171 ERROR org.apache.hadoop.fs.FSNamesystem:
> FSNamesystem initialization failed.
> java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>        at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
>        at org.apache.hadoop.dfs.FSImage.readString(FSImage.java:1368)
>        at
> org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:447)
>        at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
>        at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
>        at
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
>        at
> org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
>        at
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:296)
>        at
> org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:275)
>
>

Reply via email to