I am using Amazon EC2 with our HDFS on EBS volumes. While running a job today, our EBS volumes apparently died out of nowhere. You can see the logfile is even cut off:
2009-10-05 13:37:00,321 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/10.244.195.64 cmd=open src=/user/root/reach.intermediate/20090928.1day/part-00058 dst=null perm=null 2009-10-05 13:37:01,901 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/10.242.15.15 cmd=open src=/user/root/reach.intermedi In the event of an error, we bring all the instances down. I then tried to rerun the job (bringing all the instances back up and then attaching to EBS volumes) and the namenode will not come up. The logfile gives the error at the bottom. What are my options here to recover the file system? Thanks, Malcolm /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ip-10-243-26-82/10.243.26.82 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.19.0 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r 713890; compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008 ************************************************************/ 2009-10-05 14:20:02,120 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001 2009-10-05 14:20:02,150 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: ip-10-243-26-82.ec2.internal/10.243.26.82:50001 2009-10-05 14:20:02,154 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2009-10-05 14:20:02,254 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.ganglia.GangliaContext 2009-10-05 14:20:02,417 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel 2009-10-05 14:20:02,417 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2009-10-05 14:20:02,417 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2009-10-05 14:20:02,435 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.ganglia.GangliaContext 2009-10-05 14:20:02,436 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2009-10-05 14:20:02,751 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 23989 2009-10-05 14:20:06,859 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0 2009-10-05 14:20:06,860 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 3800773 loaded in 4 seconds. 2009-10-05 14:20:07,451 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.jav a:48) at java.lang.Integer.parseInt(Integer.java:468) at java.lang.Short.parseShort(Short.java:120) at java.lang.Short.parseShort(Short.java:78) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readShort(FSEditLog.jav a:1261) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.j ava:556) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java: 973) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java: 793) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSI mage.java:352) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirecto ry.java:87) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesys tem.java:311) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem. java:290) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java :163) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:208 ) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:194 ) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode. java:859) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868) 2009-10-05 14:20:07,451 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at ip-10-243-26-82/10.243.26.82
