We were using 0.20.2 when the issue occurred, then we set it to 2048,
and the failure was fixed.
Now we are using 0.20-append (HBase requires it), it works well too.
On 2011/02/21 10:35, Jun Young Kim wrote:
hi, yifeng.
Coung I know which version of a hadoop you are using?
thanks for your response.
Junyoung Kim ([email protected])
On 02/21/2011 10:28 AM, Yifeng Jiang wrote:
Hi,
We have met the same issue.
It seems that this error occurs, when the threads connected to the
Datanode reaches the maximum # of server threads, defined by
"dfs.datanode.max.xcievers" in hdfs-site.xml
Our solution is to increase the it from the default value (256) to a
bigger one, such as 2048.
On 2011/02/21 10:17, Jun Young Kim wrote:
hi,
in an application, I read many files in many directories.
additionally, by using MultipleOutputs class, I try to write
thousands of output files in many directories.
during reduce processing(reduce task count is 1),
almost my job(average job counts in parallel are 20) are failed.
almost error types are like
java.io.IOException: Bad connect ack with firstBadLink as
10.25.241.101:50010 at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:889)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427)
java.io.EOFException at
java.io.DataInputStream.readShort(DataInputStream.java:298) at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:113)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:881)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:820)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427)
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error
while doing final merge at
org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:159) at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362) at
org.apache.hadoop.mapred.Child$4.run(Child.java:217) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:396) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.mapred.Child.main(Child.java:211) Caused by:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
find any valid local directory for output/map_869.out at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:351)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
at
org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:182)
at org.apache.hadoop.mapreduce.task.reduce.MergeMa
currenly, I suspect this is caused by limitations of hadoop to
support output file descriptor count.
(I am using a linux server to support this job, server configuration is
$> cat /proc/sys/fs/file-max
327680
--
Yifeng Jiang