Rakesh, That error log looks like it belonged to DataNode and not NameNode. Anyways try pumping the parameter named *dfs.datanode.max.xcievers* up (shoot for 512). This param belongs to core-site.xml .
-Shrijeet On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari <[email protected]>wrote: > Hi, > > My MR Job is processing gzipped files each around 450 MB and there are 24 > of them. File block size is 512 MB. > > This job is failing consistently in the reduce phase with the following > exception (below). Any ideas how to troubleshoot this ? > > Thanks, > -Rakesh > > Datanode logs: > > INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 10 > segments left of total size: 408736960 bytes > > 2010-10-12 07:25:01,020 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.185.13.61:50010 > > 2010-10-12 07:25:01,021 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_-961587459095414398_368580 > > 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.185.13.61:50010 > > 2010-10-12 07:25:07,206 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_-7795697604292519140_368580 > > 2010-10-12 07:27:05,526 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > > 2010-10-12 07:27:05,527 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_-7687883740524807660_368625 > > 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > > 2010-10-12 07:27:11,713 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_-5546440551650461919_368626 > > 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > > 2010-10-12 07:27:17,898 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_-3894897742813130478_368628 > > 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > > 2010-10-12 07:27:24,081 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_8687736970664350304_368652 > > 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer > Exception: java.io.IOException: Unable to create new block. > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812) > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076) > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262) > > > > 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block blk_8687736970664350304_368652 bad datanode[0] nodes == > null > > 2010-10-12 07:27:30,186 WARN org.apache.hadoop.hdfs.DFSClient: Could not > get block locations. Source file > "/tmp/dartlog-json-serializer/20100929_/_temporary/_attempt_201010082153_0040_r_000000_2/jp/dart-imp-json/2010/09/29/17/part-r-00000.gz" > - Aborting... > > 2010-10-12 07:27:30,196 WARN org.apache.hadoop.mapred.TaskTracker: Error > running child > > java.io.EOFException > > at java.io.DataInputStream.readByte(DataInputStream.java:250) > > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) > > at > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) > > at org.apache.hadoop.io.Text.readString(Text.java:400) > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868) > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793) > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076) > > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262) > > 2010-10-12 07:27:30,199 INFO org.apache.hadoop.mapred.TaskRunner: Runnning > cleanup for the task > > > Namenode is throwing following exception: > > 2010-10-12 07:27:30,026 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving block blk_-892355450837523222_368657 src: /10.43.102.69:42352 dest: > /10.43.102.69:50010 > > 2010-10-12 07:27:30,206 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > writeBlock blk_-892355450837523222_368657 received exception > java.io.EOFException > > 2010-10-12 07:27:30,206 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.43.102.69:50010, > storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, > ipcPort=50020):DataXceiver > > java.io.EOFException > > at java.io.DataInputStream.readByte(DataInputStream.java:250) > > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) > > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) > > at org.apache.hadoop.io.Text.readString(Text.java:400) > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313) > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) > > at java.lang.Thread.run(Thread.java:619) > > 2010-10-12 07:27:30,272 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving block blk_786696549206331718_368657 src: /10.184.82.24:53457 dest: > /10.43.102.69:50010 > > 2010-10-12 07:27:30,459 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Receiving block blk_-6729043740571856940_368657 src: /10.185.13.60:41816 > dest: /10.43.102.69:50010 > > 2010-10-12 07:27:30,468 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /10.185.13.61:48770, dest: /10.43.102.69:50010, bytes: 1626784, op: > HDFS_WRITE, cliID: DFSClient_attempt_201010082153_0040_r_000000_2, srvID: > DS-859924705-10.43.102.69-50010-1271546912162, blockid: > blk_9216465415312085861_368611 > > 2010-10-12 07:27:30,468 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > PacketResponder 0 for block blk_9216465415312085861_368611 terminating > > 2010-10-12 07:27:30,755 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_5680087852988027619_321244 > > 2010-10-12 07:27:30,759 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_-1637914415591966611_321290 > > … > > 2010-10-12 07:27:56,412 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.43.102.69:50010, > storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, > ipcPort=50020):DataXceiver > > java.io.IOException: xceiverCount 258 exceeds the limit of concurrent > xcievers 256 > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) > > at java.lang.Thread.run(Thread.java:619) > > 2010-10-12 07:27:56,976 INFO > org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification > succeeded for blk_5731266331675183628_321238 > > 2010-10-12 07:27:57,669 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.43.102.69:50010, > storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, > ipcPort=50020):DataXceiver > > java.io.IOException: xceiverCount 258 exceeds the limit of concurrent > xcievers 256 > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) > > at java.lang.Thread.run(Thread.java:619) > > 2010-10-12 07:27:58,976 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.43.102.69:50010, > storageID=DS-859924705-10.43.102.69-50010-1271546912162, infoPort=8501, > ipcPort=50020):DataXceiver > > java.io.IOException: xceiverCount 258 exceeds the limit of concurrent > xcievers 256 > > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) > > at java.lang.Thread.run(Thread.java:619) > > > > >
