The oiv handles the fsimage fil but not the edits log, so it wouldn't help in this case. There has been talk about writing a similar tool for the edits log but nothing has been decided. Also, while the oiv will be included in 21, it works on images back to 18 (and maybe earlier). It's standalone, so it doesn't need a cluster or anything, just the fsimage file.
Option c will be very tricky and earns its place as the last-ditch effort.
-jg

Tom White wrote:
Is this an area where the Offline Image Viewer might be able to help
in the future? It's not available for 0.18.3, but seems like it would
be possible to extend it as a tool to help with c) in Todd's
description.

Tom

On Mon, Jul 20, 2009 at 8:30 PM, Todd Lipcon<[email protected]> wrote:
Hi Arv,

It sounds like your edits log in dfs.name.dir is corrupted since one of its
records got cut off by the disk filling up. When trying to replay the edit
log, it tries to read the entirety of that record and hits the end of file
unexpectedly - hence the EOFException.

Your options at this point are:

a) If you have a second copy of dfs.name.dir, it should also have a second
"edits" file. If it's longer it's possible that that copy is not corrupted.
I'd back up both copies, then duplicate the longer edit log into both name
dirs and try to start the namenode.

b) If you were running a secondary namenode, you should have a checkpoint of
the fsimage from a few hours before the failure. You can recover the fsimage
from there. You'll lose some time period's worth of metadata edits, but you
should be able to get the FS running again.

c) Last ditch attempt is to attempt to truncate the edit log at the correct
offset such that you avoid the EOFException. To do this would probably
involve adding some logging statements to the FSEditLog replay so you can
see what the byte offset of the last record it's trying to read is, and then
truncating the edit log right before that offset. This is somewhat
complicated and I wouldn't attempt it unless you (a) really need the data
and (b) don't have any other option.

-Todd

On Mon, Jul 20, 2009 at 12:27 PM, Arv Mistry <[email protected]> wrote:

Hi,

I'm getting the following error in starting up the namenode.

What happened was one of our disks filled up, we reclaimed the
disk space and tried to restart the hadoop daemons but the name node
is now not starting up.

Does anybody have any clues how to recover from this? I've tried
searching through the Jira reports but nothing obvious.

Appreciate any input, thanks.

Cheers Arv

2009-07-20 14:57:41,712 INFO org.apache.hadoop.dfs.NameNode:
STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = qa-cs1/192.168.0.54
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.18.3-dev
STARTUP_MSG:   build =  -r ; compiled by 'bamboo' on Mon Nov 10 15:58:40
PST 2008
************************************************************/
2009-07-20 14:57:41,801 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=9000
2009-07-20 14:57:41,805 INFO org.apache.hadoop.dfs.NameNode: Namenode up
at: 192.168.0.54/192.168.0.54:9000
2009-07-20 <http://192.168.0.54/192.168.0.54:9000%0A2009-07-20>14:57:41,808 
INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2009-07-20 14:57:41,816 INFO org.apache.hadoop.dfs.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-07-20 14:57:41,869 INFO org.apache.hadoop.fs.FSNamesystem:
fsOwner=hadoopadmin,hadoopadmin
2009-07-20 14:57:41,869 INFO org.apache.hadoop.fs.FSNamesystem:
supergroup=supergroup
2009-07-20 14:57:41,869 INFO org.apache.hadoop.fs.FSNamesystem:
isPermissionEnabled=true
2009-07-20 14:57:41,877 INFO org.apache.hadoop.dfs.FSNamesystemMetrics:
Initializing FSNamesystemMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2009-07-20 14:57:41,878 INFO org.apache.hadoop.fs.FSNamesystem:
Registered FSNamesystemStatusMBean
2009-07-20 14:57:41,908 INFO org.apache.hadoop.dfs.Storage: Number of
files = 1808
2009-07-20 14:57:42,153 INFO org.apache.hadoop.dfs.Storage: Number of
files under construction = 1
2009-07-20 14:57:42,157 INFO org.apache.hadoop.dfs.Storage: Image file
of size 256399 loaded in 0 seconds.
2009-07-20 14:57:42,167 ERROR
org.apache.hadoop.dfs.LeaseManager:
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605290.data
not found in lease.paths
(=[/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605294.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605298.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605303.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605328.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605335.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605337.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605340.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605346.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605401.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605432.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_200
 90720_180000_1248113605451.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605464.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605487.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605499.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605539.data])
2009-07-20 14:57:42,167 ERROR
org.apache.hadoop.dfs.LeaseManager:
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605294.data
not found in lease.paths
(=[/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605298.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605303.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605328.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605335.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605337.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605340.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605346.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605401.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605432.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605451.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_200
 90720_180000_1248113605464.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605487.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605499.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605539.data])
2009-07-20 14:57:42,169 ERROR
org.apache.hadoop.dfs.LeaseManager:
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605298.data
not found in lease.paths
(=[/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605303.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605328.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605335.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605337.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605340.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605346.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605401.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605290.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605294.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605432.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_200
 90720_180000_1248113605451.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605464.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605487.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605499.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605539.data])
2009-07-20 14:57:42,169 ERROR
org.apache.hadoop.dfs.LeaseManager:
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605303.data
not found in lease.paths
(=[/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605328.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605335.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605337.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605340.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605346.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_170000_1248113605401.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605290.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605294.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605432.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605451.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_200
 90720_180000_1248113605464.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605487.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605499.data,
/opt/hadoop/data/disk1/cs/raw/20090720/cs_2_20090720_180000_1248113605539.data])
2009-07-20 14:57:42,171 ERROR org.apache.hadoop.fs.FSNamesystem:
FSNamesystem initialization failed.
java.io.EOFException
       at java.io.DataInputStream.readFully(DataInputStream.java:180)
       at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
       at org.apache.hadoop.dfs.FSImage.readString(FSImage.java:1368)
       at
org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:447)
       at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
       at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:675)
       at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:289)
       at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
       at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:296)
       at
org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:275)




Reply via email to