Raghu, seems to be resolved by your patch: http://issues.apache.org/jira/browse/HADOOP-3007
Do you know of any other "complaints" on this issue (conn reset & related errors) after applying this patch ? Thanks. On Wed, Jul 16, 2008 at 4:04 PM, brainstorm <[EMAIL PROTECTED]> wrote: > Just for the record, as I have seen on previous archives regarding > this same problem, I've changed the (cheap) 10/100 switch with a > (robust?) 100/1000 one and a couple of ethernet cables... and nope, in > my case it's not hardware related (at least on switch/cable end). > > Any other hints ? > > Thanks in advance ! > > On Wed, Jul 16, 2008 at 3:12 PM, brainstorm <[EMAIL PROTECTED]> wrote: >> If you refer to the other nodes: >> >> 2008-07-16 14:41:00,124 ERROR dfs.DataNode - >> 192.168.0.252:50010:DataXceiver: java.io.IOException: Block >> blk_7443738244200783289 has already been started (though not >> completed), and thus cannot be created. >> at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:638) >> at >> org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1983) >> at >> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074) >> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938) >> at java.lang.Thread.run(Thread.java:595) >> >> 2008-07-16 14:41:00,309 ERROR dfs.DataNode - >> 192.168.0.252:50010:DataXceiver: java.io.IOException: Block >> blk_7443738244200783289 is valid, and cannot be written to. >> at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:608) >> at >> org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1983) >> at >> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074) >> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938) >> at java.lang.Thread.run(Thread.java:595) >> >> and: >> >> 2008-07-16 14:41:00,178 WARN dfs.DataNode - >> 192.168.0.253:50010:Failed to transfer blk_7443738244200783289 to >> 192.168.0.252:50010 got java.net.SocketException: Connection reset >> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) >> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >> at >> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) >> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) >> at java.io.DataOutputStream.write(DataOutputStream.java:90) >> at >> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1602) >> at >> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1636) >> at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:2391) >> at java.lang.Thread.run(Thread.java:595) >> >> (Seem inter-node DFS communication errors also :-/) >> >> On Tue, Jul 15, 2008 at 11:19 PM, Raghu Angadi <[EMAIL PROTECTED]> wrote: >>> >>> Are there any errors reported on the other side of the socket (for the first >>> error below, its the datanode on 192.168.0.251)?. >>> >>> Raghu. >>> >>> brainstorm wrote: >>>> >>>> I'm getting the following WARNINGs that seem to slow down my nutch >>>> processes on a 3 node and 1 frontend cluster: >>>> >>>> 2008-07-15 18:53:19,048 WARN dfs.DataNode - >>>> 192.168.0.100:50010:Failed to transfer blk_-8676066332392254756 to >>>> 192.168.0.251:50010 got java.net.SocketException: Connection reset >>>> at >>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) >>>> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >>>> at >>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) >>>> at >>>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) >>>> at java.io.DataOutputStream.write(DataOutputStream.java:90) >>>> at >>>> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1602) >>>> at >>>> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1636) >>>> at >>>> org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:2391) >>>> at java.lang.Thread.run(Thread.java:595) >>>> >>>> 2008-07-15 18:53:52,162 WARN dfs.DataNode - >>>> 192.168.0.100:50010:Failed to transfer blk_5699662911845813103 to >>>> 192.168.0.253:50010 got java.net.SocketException: Broken pipe >>>> at java.net.SocketOutputStream.socketWrite0(Native Method) >>>> at >>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) >>>> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >>>> at >>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) >>>> at >>>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) >>>> at java.io.DataOutputStream.write(DataOutputStream.java:90) >>>> at >>>> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1602) >>>> at >>>> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1636) >>>> at >>>> org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:2391) >>>> at java.lang.Thread.run(Thread.java:595) >>>> >>>> I've looked for firewalling issues but right now the test setup is: >>>> >>>> 3 nodes with "iptables -F" (default ACCEPT policy for INPUT & OUTPUT >>>> (aka: no firewall)). >>>> >>>> Frontend console (192.168.0.100) has ACCEPT for NODE to NODE & frontend. >>>> >>>> I've been debugging with wireshark, but all I see is RST packets sent >>>> from frontend to nodes, no corrupted frames... When there's no reset, >>>> I just see .jar contents flying by (RMI?)... What am I missing here ? >>>> :-S >>> >>> >> >
