Given this is the third time this has come up in the past two days, I guess we need a new FAQ entry or three.
We also clearly need to update the quickstart that says: a) Do not run a datanode on the namenode. b) Make sure dfs.name.dir has two entries, one on a remote box. c) The slaves files has nothing to do with what nodes are in the HDFS. On Oct 6, 2010, at 1:56 PM, Patrick Marchwiak wrote: > While I was copying files to hdfs, the hadoop fs client started to > report errors. Digging into the datanode logs revealed [1] that I had > run out of space on one of my datanodes. The namenode (running on the > same machine as the failed datanode) died with a fatal error [2] when > this happened and the logs seem to indicate some kind of corruption. I > am unable to start up my namenode now due to the current state of hdfs > [3]. > > I stumbled upon HDFS-1378 which implies that manual editing of edit > logs must be done to recover from this. How would one go about doing > this? Are there any other options? Is this expected to happen when a > datanode runs out of space during a copy? I'm not against wiping clean > the data directories of each datanode and reformatting the namenode, > if necessary. > > One other part of this scenario that I can't explain is why data was > being written to this node in the first place. This machine was not > listed in the slaves file yet it was still being treated as a > datanode. I realize now that the datanode daemon should not have been > started on this machine but I would imagine that it would be ignored > by the client if it was not in the configuration. > > I'm running CDH3b2. > > Thanks, > Patrick > > > [1] datanode log when space ran out: > > 2010-10-06 10:30:22,995 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block > blk_-5413202144274811562_223793 src: /128.115.210.46:34712 dest: > /128.115.210.46:50010 > 2010-10-06 10:30:23,599 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskError: > exception: > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:260) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:453) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:377) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118) > 2010-10-06 10:30:23,617 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in > receiveBlock for block blk_-5413202144274811562_223793 > org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: No space > left on device > > [2] namenode log after space ran out: > > 2010-10-06 10:31:03,675 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Unable to sync > edit log. Fatal Error. > 2010-10-06 10:31:03,675 FATAL > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All > storage directories are inaccessible. > > [3] namenode log error during startup: > 2010-10-06 10:46:35,889 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed. > java.io.IOException: Incorrect data format. logVersion is -18 but > writables.length is 0. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:556) > ....