Yes. Hdfs and Mapred related dirs are set outside of /tmp. On Tue, Dec 27, 2011 at 6:48 PM, alo alt <[email protected]> wrote:
> Hi, > > did you set the hdfs-related dirs outside of /tmp? Most *ux systems > clean them up on reboot. > > - Alex > > On Tue, Dec 27, 2011 at 2:09 PM, Rajat Goel <[email protected]> wrote: > > Hi, > > > > I have a 7-node setup (1 - Namenode/JobTracker, 6 - > Datanodes/TaskTrackers) > > running Hadoop version 0.20.203. > > > > I performed the following test: > > Initially cluster is running smoothly. Just before launching a MapReduce > > job (about one or two minutes before), I shutdown one of the data nodes > > (rebooted the machine). Then my MapReduce job starts but immediately > fails > > with following messages on stderr: > > > > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please > > use org.apache.hadoop.log.metrics.EventCounter in all the > log4j.properties > > files. > > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please > > use org.apache.hadoop.log.metrics.EventCounter in all the > log4j.properties > > files. > > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please > > use org.apache.hadoop.log.metrics.EventCounter in all the > log4j.properties > > files. > > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please > > use org.apache.hadoop.log.metrics.EventCounter in all the > log4j.properties > > files. > > NOTICE: Configuration: /device.map /region.map /url.map > > /data/output/2011/12/26/08 > > PS:192.168.100.206:11111 3600 true Notice > > 11/12/26 09:10:26 WARN mapred.JobClient: Use GenericOptionsParser for > > parsing the arguments. Applications should implement Tool for the same. > > 11/12/26 09:10:26 INFO input.FileInputFormat: Total input paths to > process > > : 24 > > 11/12/26 09:10:37 INFO hdfs.DFSClient: Exception in > createBlockOutputStream > > java.io.IOException: Bad connect ack with firstBadLink as > > 192.168.100.5:50010 > > 11/12/26 09:10:37 INFO hdfs.DFSClient: Abandoning block > > blk_-6309642664478517067_35619 > > 11/12/26 09:10:37 INFO hdfs.DFSClient: Waiting to find target node: > > 192.168.100.7:50010 > > 11/12/26 09:10:44 INFO hdfs.DFSClient: Exception in > createBlockOutputStream > > java.net.NoRouteToHostException: No route to host > > 11/12/26 09:10:44 INFO hdfs.DFSClient: Abandoning block > > blk_4129088682008611797_35619 > > 11/12/26 09:10:53 INFO hdfs.DFSClient: Exception in > createBlockOutputStream > > java.io.IOException: Bad connect ack with firstBadLink as > > 192.168.100.5:50010 > > 11/12/26 09:10:53 INFO hdfs.DFSClient: Abandoning block > > blk_3596375242483863157_35619 > > 11/12/26 09:11:01 INFO hdfs.DFSClient: Exception in > createBlockOutputStream > > java.io.IOException: Bad connect ack with firstBadLink as > > 192.168.100.5:50010 > > 11/12/26 09:11:01 INFO hdfs.DFSClient: Abandoning block > > blk_724369205729364853_35619 > > 11/12/26 09:11:07 WARN hdfs.DFSClient: DataStreamer Exception: > > java.io.IOException: Unable to create new block. > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3002) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) > > > > 11/12/26 09:11:07 WARN hdfs.DFSClient: Error Recovery for block > > blk_724369205729364853_35619 bad datanode[1] nodes == null > > 11/12/26 09:11:07 WARN hdfs.DFSClient: Could not get block locations. > > Source file > > > "/data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292/job.split" > > - Aborting... > > 11/12/26 09:11:07 INFO mapred.JobClient: Cleaning up the staging area > > > hdfs://machine-100-205:9000/data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292 > > Exception in thread "main" java.io.IOException: Bad connect ack with > > firstBadLink as 192.168.100.5:50010 > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3068) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2983) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) > > 11/12/26 09:11:07 ERROR hdfs.DFSClient: Exception closing file > > > /data/hadoop-admin/mapred/staging/admin/.staging/job_201112200923_0292/job.split > > : java.io.IOException: Bad connect ack with firstBadLink as > > 192.168.100.5:50010 > > java.io.IOException: Bad connect ack with firstBadLink as > > 192.168.100.5:50010 > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:3068) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2983) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) > > at > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) > > > > > > - In the above logs, 192.168.100.5 is the machine I rebooted. > > - JobTracker's log file doesn't have any logs in the above time period. > > - NameNode's log file doesn't have any exceptions or any messages related > > to the above error logs. > > - All nodes can access each other via IP or hostnames. > > - ulimit values for files is set to 1024 but I don't see many connections > > in CLOSE_WAIT state (Googled a bit and some ppl suggest that this value > > could be a culprit in some cases) > > - My Hadoop configuration files have settings for no. of mappers (8), > > reducers (4), io.sort.mb (512 mb). Most of the other parameters have been > > configured to their default values. > > > > Can someone please provide any pointers to solution of this problem? > > > > Thanks, > > Rajat > > > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > P Think of the environment: please don't print this email unless you > really need to. >
