Just for the record, as I have seen on previous archives regarding this same problem, I've changed the (cheap) 10/100 switch with a (robust?) 100/1000 one and a couple of ethernet cables... and nope, in my case it's not hardware related (at least on switch/cable end).
Any other hints ? Thanks in advance ! On Wed, Jul 16, 2008 at 3:12 PM, brainstorm <[EMAIL PROTECTED]> wrote: > If you refer to the other nodes: > > 2008-07-16 14:41:00,124 ERROR dfs.DataNode - > 192.168.0.252:50010:DataXceiver: java.io.IOException: Block > blk_7443738244200783289 has already been started (though not > completed), and thus cannot be created. > at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:638) > at > org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1983) > at > org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074) > at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938) > at java.lang.Thread.run(Thread.java:595) > > 2008-07-16 14:41:00,309 ERROR dfs.DataNode - > 192.168.0.252:50010:DataXceiver: java.io.IOException: Block > blk_7443738244200783289 is valid, and cannot be written to. > at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:608) > at > org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1983) > at > org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1074) > at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:938) > at java.lang.Thread.run(Thread.java:595) > > and: > > 2008-07-16 14:41:00,178 WARN dfs.DataNode - > 192.168.0.253:50010:Failed to transfer blk_7443738244200783289 to > 192.168.0.252:50010 got java.net.SocketException: Connection reset > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at > org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1602) > at > org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1636) > at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:2391) > at java.lang.Thread.run(Thread.java:595) > > (Seem inter-node DFS communication errors also :-/) > > On Tue, Jul 15, 2008 at 11:19 PM, Raghu Angadi <[EMAIL PROTECTED]> wrote: >> >> Are there any errors reported on the other side of the socket (for the first >> error below, its the datanode on 192.168.0.251)?. >> >> Raghu. >> >> brainstorm wrote: >>> >>> I'm getting the following WARNINGs that seem to slow down my nutch >>> processes on a 3 node and 1 frontend cluster: >>> >>> 2008-07-15 18:53:19,048 WARN dfs.DataNode - >>> 192.168.0.100:50010:Failed to transfer blk_-8676066332392254756 to >>> 192.168.0.251:50010 got java.net.SocketException: Connection reset >>> at >>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) >>> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >>> at >>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) >>> at >>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) >>> at java.io.DataOutputStream.write(DataOutputStream.java:90) >>> at >>> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1602) >>> at >>> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1636) >>> at >>> org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:2391) >>> at java.lang.Thread.run(Thread.java:595) >>> >>> 2008-07-15 18:53:52,162 WARN dfs.DataNode - >>> 192.168.0.100:50010:Failed to transfer blk_5699662911845813103 to >>> 192.168.0.253:50010 got java.net.SocketException: Broken pipe >>> at java.net.SocketOutputStream.socketWrite0(Native Method) >>> at >>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) >>> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >>> at >>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) >>> at >>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) >>> at java.io.DataOutputStream.write(DataOutputStream.java:90) >>> at >>> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1602) >>> at >>> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1636) >>> at >>> org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:2391) >>> at java.lang.Thread.run(Thread.java:595) >>> >>> I've looked for firewalling issues but right now the test setup is: >>> >>> 3 nodes with "iptables -F" (default ACCEPT policy for INPUT & OUTPUT >>> (aka: no firewall)). >>> >>> Frontend console (192.168.0.100) has ACCEPT for NODE to NODE & frontend. >>> >>> I've been debugging with wireshark, but all I see is RST packets sent >>> from frontend to nodes, no corrupted frames... When there's no reset, >>> I just see .jar contents flying by (RMI?)... What am I missing here ? >>> :-S >> >> >
