Hi,

did you set the hdfs-related dirs outside of /tmp? Most *ux systems
clean them up on reboot.

- Alex

On Tue, Dec 27, 2011 at 2:09 PM, Rajat Goel <[email protected]> wrote:
> Hi,
>
> I have a 7-node setup (1 - Namenode/JobTracker, 6 - Datanodes/TaskTrackers)
> running Hadoop version 0.20.203.
>
> I performed the following test:
> Initially cluster is running smoothly. Just before launching a MapReduce
> job (about one or two minutes before), I shutdown one of the data nodes
> (rebooted the machine). Then my MapReduce job starts but immediately fails
> with following messages on stderr:
>
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
> files.
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
> files.
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
> files.
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
> files.
> NOTICE: Configuration: /device.map    /region.map    /url.map
> /data/output/2011/12/26/08
>  PS:192.168.100.206:11111    3600    true    Notice
> 11/12/26 09:10:26 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 11/12/26 09:10:26 INFO input.FileInputFormat: Total input paths to process
> : 24
> 11/12/26 09:10:37 INFO hdfs.DFSClient: Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as
> 192.168.100.5:50010
> 11/12/26 09:10:37 INFO hdfs.DFSClient: Abandoning block
> blk_-6309642664478517067_35619
> 11/12/26 09:10:37 INFO hdfs.DFSClient: Waiting to find target node:
> 192.168.100.7:50010
> 11/12/26 09:10:44 INFO hdfs.DFSClient: Exception in createBlockOutputStream
> java.net.NoRouteToHostException: No route to host
> 11/12/26 09:10:44 INFO hdfs.DFSClient: Abandoning block
> blk_4129088682008611797_35619
> 11/12/26 09:10:53 INFO hdfs.DFSClient: Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as
> 192.168.100.5:50010
> 11/12/26 09:10:53 INFO hdfs.DFSClient: Abandoning block
> blk_3596375242483863157_35619
> 11/12/26 09:11:01 INFO hdfs.DFSClient: Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as
> 192.168.100.5:50010
> 11/12/26 09:11:01 INFO hdfs.DFSClient: Abandoning block
> blk_724369205729364853_35619
> 11/12/26 09:11:07 WARN hdfs.DFSClient: DataStreamer Exception:
> java.io.IOException: Unable to create new block.
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3002)
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
>
> 11/12/26 09:11:07 WARN hdfs.DFSClient: Error Recovery for block
> blk_724369205729364853_35619 bad datanode[1] nodes == null
> 11/12/26 09:11:07 WARN hdfs.DFSClient: Could not get block locations.
> Source file
> "/data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292/job.split"
> - Aborting...
> 11/12/26 09:11:07 INFO mapred.JobClient: Cleaning up the staging area
> hdfs://machine-100-205:9000/data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292
> Exception in thread "main" java.io.IOException: Bad connect ack with
> firstBadLink as 192.168.100.5:50010
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3068)
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2983)
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
> 11/12/26 09:11:07 ERROR hdfs.DFSClient: Exception closing file
> /data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292/job.split
> : java.io.IOException: Bad connect ack with firstBadLink as
> 192.168.100.5:50010
> java.io.IOException: Bad connect ack with firstBadLink as
> 192.168.100.5:50010
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3068)
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2983)
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
>    at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
>
>
> - In the above logs, 192.168.100.5 is the machine I rebooted.
> - JobTracker's log file doesn't have any logs in the above time period.
> - NameNode's log file doesn't have any exceptions or any messages related
> to the above error logs.
> - All nodes can access each other via IP or hostnames.
> - ulimit values for files is set to 1024 but I don't see many connections
> in CLOSE_WAIT state (Googled a bit and some ppl suggest that this value
> could be a culprit in some cases)
> - My Hadoop configuration files have settings for no. of mappers (8),
> reducers (4), io.sort.mb (512 mb). Most of the other parameters have been
> configured to their default values.
>
> Can someone please provide any pointers to solution of this problem?
>
> Thanks,
> Rajat



-- 
Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you
really need to.

Reply via email to