I am using Amazon EC2 with our HDFS on EBS volumes.  While running a job
today, our EBS volumes apparently died out of nowhere.  You can see the
logfile is even cut off:

 

2009-10-05 13:37:00,321 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root,bin,daemon,sys,adm,disk,wheel     ip=/10.244.195.64
cmd=open
src=/user/root/reach.intermediate/20090928.1day/part-00058      dst=null
perm=null

2009-10-05 13:37:01,901 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root,bin,daemon,sys,adm,disk,wheel     ip=/10.242.15.15
cmd=open        src=/user/root/reach.intermedi

 

 

In the event of an error, we bring all the instances down.  I then tried
to rerun the job (bringing all the instances back up and then attaching
to EBS volumes) and the namenode will not come up.  The logfile gives
the error at the bottom.  What are my options here to recover the file
system?

 

Thanks,

Malcolm

 

 

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = ip-10-243-26-82/10.243.26.82

STARTUP_MSG:   args = []

STARTUP_MSG:   version = 0.19.0

STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r
713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008

************************************************************/

2009-10-05 14:20:02,120 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=50001

2009-10-05 14:20:02,150 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
ip-10-243-26-82.ec2.internal/10.243.26.82:50001

2009-10-05 14:20:02,154 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null

2009-10-05 14:20:02,254 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.ganglia.GangliaContext

2009-10-05 14:20:02,417 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=root,root,bin,daemon,sys,adm,disk,wheel

2009-10-05 14:20:02,417 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
supergroup=supergroup

2009-10-05 14:20:02,417 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true

2009-10-05 14:20:02,435 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.ganglia.GangliaContext

2009-10-05 14:20:02,436 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean

2009-10-05 14:20:02,751 INFO
org.apache.hadoop.hdfs.server.common.Storage: Number of files = 23989

2009-10-05 14:20:06,859 INFO
org.apache.hadoop.hdfs.server.common.Storage: Number of files under
construction = 0

2009-10-05 14:20:06,860 INFO
org.apache.hadoop.hdfs.server.common.Storage: Image file of size 3800773
loaded in 4 seconds.

2009-10-05 14:20:07,451 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NumberFormatException: For input string: ""

        at
java.lang.NumberFormatException.forInputString(NumberFormatException.jav
a:48)

        at java.lang.Integer.parseInt(Integer.java:468)

        at java.lang.Short.parseShort(Short.java:120)

        at java.lang.Short.parseShort(Short.java:78)

        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav
a:1261)

        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j
ava:556)

        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:
973)

        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:
793)

        at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSI
mage.java:352)

        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirecto
ry.java:87)

        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesys
tem.java:311)

        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.
java:290)

        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java
:163)

        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:208
)

        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:194
)

        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.
java:859)

        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)

 

2009-10-05 14:20:07,451 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at ip-10-243-26-82/10.243.26.82

 

Reply via email to