Did the DN you've just rebooted connecting to the NN? Mostly the datanode daemon is'nt running, check it: ps waux |grep "DataNode" |grep -v "grep"
- ALex On Tue, Dec 27, 2011 at 2:44 PM, Rajat Goel <[email protected]> wrote: > Yes. Hdfs and Mapred related dirs are set outside of /tmp. > > On Tue, Dec 27, 2011 at 6:48 PM, alo alt <[email protected]> wrote: > >> Hi, >> >> did you set the hdfs-related dirs outside of /tmp? Most *ux systems >> clean them up on reboot. >> >> - Alex >> >> On Tue, Dec 27, 2011 at 2:09 PM, Rajat Goel <[email protected]> wrote: >> > Hi, >> > >> > I have a 7-node setup (1 - Namenode/JobTracker, 6 - >> Datanodes/TaskTrackers) >> > running Hadoop version 0.20.203. >> > >> > I performed the following test: >> > Initially cluster is running smoothly. Just before launching a MapReduce >> > job (about one or two minutes before), I shutdown one of the data nodes >> > (rebooted the machine). Then my MapReduce job starts but immediately >> fails >> > with following messages on stderr: >> > >> > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please >> > use org.apache.hadoop.log.metrics.EventCounter in all the >> log4j.properties >> > files. >> > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please >> > use org.apache.hadoop.log.metrics.EventCounter in all the >> log4j.properties >> > files. >> > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please >> > use org.apache.hadoop.log.metrics.EventCounter in all the >> log4j.properties >> > files. >> > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please >> > use org.apache.hadoop.log.metrics.EventCounter in all the >> log4j.properties >> > files. >> > NOTICE: Configuration: /device.map /region.map /url.map >> > /data/output/2011/12/26/08 >> > PS:192.168.100.206:11111 3600 true Notice >> > 11/12/26 09:10:26 WARN mapred.JobClient: Use GenericOptionsParser for >> > parsing the arguments. Applications should implement Tool for the same. >> > 11/12/26 09:10:26 INFO input.FileInputFormat: Total input paths to >> process >> > : 24 >> > 11/12/26 09:10:37 INFO hdfs.DFSClient: Exception in >> createBlockOutputStream >> > java.io.IOException: Bad connect ack with firstBadLink as >> > 192.168.100.5:50010 >> > 11/12/26 09:10:37 INFO hdfs.DFSClient: Abandoning block >> > blk_-6309642664478517067_35619 >> > 11/12/26 09:10:37 INFO hdfs.DFSClient: Waiting to find target node: >> > 192.168.100.7:50010 >> > 11/12/26 09:10:44 INFO hdfs.DFSClient: Exception in >> createBlockOutputStream >> > java.net.NoRouteToHostException: No route to host >> > 11/12/26 09:10:44 INFO hdfs.DFSClient: Abandoning block >> > blk_4129088682008611797_35619 >> > 11/12/26 09:10:53 INFO hdfs.DFSClient: Exception in >> createBlockOutputStream >> > java.io.IOException: Bad connect ack with firstBadLink as >> > 192.168.100.5:50010 >> > 11/12/26 09:10:53 INFO hdfs.DFSClient: Abandoning block >> > blk_3596375242483863157_35619 >> > 11/12/26 09:11:01 INFO hdfs.DFSClient: Exception in >> createBlockOutputStream >> > java.io.IOException: Bad connect ack with firstBadLink as >> > 192.168.100.5:50010 >> > 11/12/26 09:11:01 INFO hdfs.DFSClient: Abandoning block >> > blk_724369205729364853_35619 >> > 11/12/26 09:11:07 WARN hdfs.DFSClient: DataStreamer Exception: >> > java.io.IOException: Unable to create new block. >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3002) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) >> > >> > 11/12/26 09:11:07 WARN hdfs.DFSClient: Error Recovery for block >> > blk_724369205729364853_35619 bad datanode[1] nodes == null >> > 11/12/26 09:11:07 WARN hdfs.DFSClient: Could not get block locations. >> > Source file >> > >> "/data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292/job.split" >> > - Aborting... >> > 11/12/26 09:11:07 INFO mapred.JobClient: Cleaning up the staging area >> > >> hdfs://machine-100-205:9000/data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292 >> > Exception in thread "main" java.io.IOException: Bad connect ack with >> > firstBadLink as 192.168.100.5:50010 >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3068) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2983) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) >> > 11/12/26 09:11:07 ERROR hdfs.DFSClient: Exception closing file >> > >> /data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292/job.split >> > : java.io.IOException: Bad connect ack with firstBadLink as >> > 192.168.100.5:50010 >> > java.io.IOException: Bad connect ack with firstBadLink as >> > 192.168.100.5:50010 >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3068) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2983) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) >> > at >> > >> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) >> > >> > >> > - In the above logs, 192.168.100.5 is the machine I rebooted. >> > - JobTracker's log file doesn't have any logs in the above time period. >> > - NameNode's log file doesn't have any exceptions or any messages related >> > to the above error logs. >> > - All nodes can access each other via IP or hostnames. >> > - ulimit values for files is set to 1024 but I don't see many connections >> > in CLOSE_WAIT state (Googled a bit and some ppl suggest that this value >> > could be a culprit in some cases) >> > - My Hadoop configuration files have settings for no. of mappers (8), >> > reducers (4), io.sort.mb (512 mb). Most of the other parameters have been >> > configured to their default values. >> > >> > Can someone please provide any pointers to solution of this problem? >> > >> > Thanks, >> > Rajat >> >> >> >> -- >> Alexander Lorenz >> http://mapredit.blogspot.com >> >> P Think of the environment: please don't print this email unless you >> really need to. >> -- Alexander Lorenz http://mapredit.blogspot.com P Think of the environment: please don't print this email unless you really need to.
