hi,all when i run some jobs on hadoop, some datanodes will die,then job will fail finally.But datanode process is alive,when the cluster clams down,the dead datanode will come back. when datanode is down, i see some error logs like this:
2/01/14 14:08:41 INFO mapred.JobClient: Task Id : attempt_201201082210_0051_m_000313_0, Status : FAILED java.io.IOException: pipe child exception at org.apache.hadoop.mapred.pipes.Application.abort(Application.java:225) at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:101) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.io.IOException: Could not obtain block: blk_3541449604139837149_1405226 file=/testdata/part-00313 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1993) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1800) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:160) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:208) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:193) at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:88) ... 7 more the message " Could not obtain block: blk_* ..." reminds me the "dfs.datanode.max.xcievers",but I have set it to 4096 already. how to resolve this problem?