Are there any errors reported on the other side of the socket (for the first error below, its the datanode on 192.168.0.251)?.

Raghu.

brainstorm wrote:
I'm getting the following WARNINGs that seem to slow down my nutch
processes on a 3 node and 1 frontend cluster:

2008-07-15 18:53:19,048 WARN  dfs.DataNode -
192.168.0.100:50010:Failed to transfer blk_-8676066332392254756 to
192.168.0.251:50010 got java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at 
org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1602)
        at 
org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1636)
        at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:2391)
        at java.lang.Thread.run(Thread.java:595)

2008-07-15 18:53:52,162 WARN  dfs.DataNode -
192.168.0.100:50010:Failed to transfer blk_5699662911845813103 to
192.168.0.253:50010 got java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at 
org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1602)
        at 
org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1636)
        at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:2391)
        at java.lang.Thread.run(Thread.java:595)

I've looked for firewalling issues but right now the test setup is:

3 nodes with "iptables -F" (default ACCEPT policy for INPUT & OUTPUT
(aka: no firewall)).

Frontend console (192.168.0.100) has ACCEPT for NODE to NODE & frontend.

I've been debugging with wireshark, but all I see is RST packets sent
from frontend to nodes, no corrupted frames... When there's no reset,
I just see .jar contents flying by (RMI?)... What am I missing here ?
:-S

Reply via email to