Hi, did you set the hdfs-related dirs outside of /tmp? Most *ux systems clean them up on reboot.
- Alex On Tue, Dec 27, 2011 at 2:09 PM, Rajat Goel <[email protected]> wrote: > Hi, > > I have a 7-node setup (1 - Namenode/JobTracker, 6 - Datanodes/TaskTrackers) > running Hadoop version 0.20.203. > > I performed the following test: > Initially cluster is running smoothly. Just before launching a MapReduce > job (about one or two minutes before), I shutdown one of the data nodes > (rebooted the machine). Then my MapReduce job starts but immediately fails > with following messages on stderr: > > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please > use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties > files. > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please > use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties > files. > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please > use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties > files. > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please > use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties > files. > NOTICE: Configuration: /device.map /region.map /url.map > /data/output/2011/12/26/08 > PS:192.168.100.206:11111 3600 true Notice > 11/12/26 09:10:26 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 11/12/26 09:10:26 INFO input.FileInputFormat: Total input paths to process > : 24 > 11/12/26 09:10:37 INFO hdfs.DFSClient: Exception in createBlockOutputStream > java.io.IOException: Bad connect ack with firstBadLink as > 192.168.100.5:50010 > 11/12/26 09:10:37 INFO hdfs.DFSClient: Abandoning block > blk_-6309642664478517067_35619 > 11/12/26 09:10:37 INFO hdfs.DFSClient: Waiting to find target node: > 192.168.100.7:50010 > 11/12/26 09:10:44 INFO hdfs.DFSClient: Exception in createBlockOutputStream > java.net.NoRouteToHostException: No route to host > 11/12/26 09:10:44 INFO hdfs.DFSClient: Abandoning block > blk_4129088682008611797_35619 > 11/12/26 09:10:53 INFO hdfs.DFSClient: Exception in createBlockOutputStream > java.io.IOException: Bad connect ack with firstBadLink as > 192.168.100.5:50010 > 11/12/26 09:10:53 INFO hdfs.DFSClient: Abandoning block > blk_3596375242483863157_35619 > 11/12/26 09:11:01 INFO hdfs.DFSClient: Exception in createBlockOutputStream > java.io.IOException: Bad connect ack with firstBadLink as > 192.168.100.5:50010 > 11/12/26 09:11:01 INFO hdfs.DFSClient: Abandoning block > blk_724369205729364853_35619 > 11/12/26 09:11:07 WARN hdfs.DFSClient: DataStreamer Exception: > java.io.IOException: Unable to create new block. > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3002) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) > > 11/12/26 09:11:07 WARN hdfs.DFSClient: Error Recovery for block > blk_724369205729364853_35619 bad datanode[1] nodes == null > 11/12/26 09:11:07 WARN hdfs.DFSClient: Could not get block locations. > Source file > "/data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292/job.split" > - Aborting... > 11/12/26 09:11:07 INFO mapred.JobClient: Cleaning up the staging area > hdfs://machine-100-205:9000/data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292 > Exception in thread "main" java.io.IOException: Bad connect ack with > firstBadLink as 192.168.100.5:50010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3068) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2983) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) > 11/12/26 09:11:07 ERROR hdfs.DFSClient: Exception closing file > /data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292/job.split > : java.io.IOException: Bad connect ack with firstBadLink as > 192.168.100.5:50010 > java.io.IOException: Bad connect ack with firstBadLink as > 192.168.100.5:50010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3068) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2983) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) > > > - In the above logs, 192.168.100.5 is the machine I rebooted. > - JobTracker's log file doesn't have any logs in the above time period. > - NameNode's log file doesn't have any exceptions or any messages related > to the above error logs. > - All nodes can access each other via IP or hostnames. > - ulimit values for files is set to 1024 but I don't see many connections > in CLOSE_WAIT state (Googled a bit and some ppl suggest that this value > could be a culprit in some cases) > - My Hadoop configuration files have settings for no. of mappers (8), > reducers (4), io.sort.mb (512 mb). Most of the other parameters have been > configured to their default values. > > Can someone please provide any pointers to solution of this problem? > > Thanks, > Rajat -- Alexander Lorenz http://mapredit.blogspot.com P Think of the environment: please don't print this email unless you really need to.
