Rakesh,
That error log looks like it belonged to DataNode and not NameNode. Anyways
try pumping the parameter named *dfs.datanode.max.xcievers* up (shoot for
512). This param belongs to core-site.xml .

-Shrijeet

On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari
<[email protected]>wrote:

>  Hi,
>
> My MR Job is processing gzipped files each around 450 MB and there are 24
> of them. File block size is 512 MB.
>
> This job is failing consistently in the reduce phase with the following
> exception (below). Any ideas how to troubleshoot this ?
>
> Thanks,
> -Rakesh
>
> Datanode logs:
>
> INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10
> segments left of total size: 408736960 bytes
>
> 2010-10-12 07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-961587459095414398_368580
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 10.185.13.61:50010
>
> 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7795697604292519140_368580
>
> 2010-10-12 07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-7687883740524807660_368625
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-5546440551650461919_368626
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-3894897742813130478_368628
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
>
> 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_8687736970664350304_368652
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Unable to create new block.
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
>
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_8687736970664350304_368652 bad datanode[0] nodes ==
> null
>
> 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not
> get block locations. Source file
> "/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz"
> - Aborting...
>
> 2010-10-12 07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
>         at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
>         at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
>         at org.apache.hadoop.io.Text.readString(Text.java:400)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 2010-10-12 07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning
> cleanup for the task
>
>
> Namenode is throwing following exception:
>
> 2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: 
> /10.43.102.69:50010
>
> 2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> writeBlock blk_-892355450837523222_368657 received exception 
> java.io.EOFException
>
> 2010-10-12 07:27:30,206 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.43.102.69:50010, 
> storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, 
> ipcPort=50020):DataXceiver
>
> java.io.EOFException
>
>         at java.io.DataInputStream.readByte(DataInputStream.java:250)
>
>         at 
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>
>         at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>
>         at org.apache.hadoop.io.Text.readString(Text.java:400)
>
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)
>
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: 
> /10.43.102.69:50010
>
> 2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 
> dest: /10.43.102.69:50010
>
> 2010-10-12 07:27:30,468 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: 
> HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: 
> DS-859924705-10.43.102.69-50010-1271546912162, blockid: 
> blk_9216465415312085861_368611
>
> 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> PacketResponder 0 for block blk_9216465415312085861_368611 terminating
>
> 2010-10-12 07:27:30,755 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
> succeeded for blk_5680087852988027619_321244
>
> 2010-10-12 07:27:30,759 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
> succeeded for blk_-1637914415591966611_321290
>
> …
>
> 2010-10-12 07:27:56,412 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.43.102.69:50010, 
> storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, 
> ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent 
> xcievers 256
>
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:56,976 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
> succeeded for blk_5731266331675183628_321238
>
> 2010-10-12 07:27:57,669 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.43.102.69:50010, 
> storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, 
> ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent 
> xcievers 256
>
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> 2010-10-12 07:27:58,976 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.43.102.69:50010, 
> storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, 
> ipcPort=50020):DataXceiver
>
> java.io.IOException: xceiverCount 258 exceeds the limit of concurrent 
> xcievers 256
>
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>
>         at java.lang.Thread.run(Thread.java:619)
>
>
>
>
>

Reply via email to