slow-reading dfs clients do not recover from datanode-write-timeouts
--------------------------------------------------------------------

                 Key: HADOOP-3831
                 URL: https://issues.apache.org/jira/browse/HADOOP-3831
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.17.1
            Reporter: Christian Kunz


Some of our applications read through certain files from dfs (using libhdfs) 
much slower than through others, such that they trigger the write timeout 
introduced in 0.17.x into the datanodes. Eventually they fail.

Dfs clients should be able to recover from such a situation.

In the meantime, would setting
dfs.datanode.socket.write.timeout=0
in hadoop-site.xml help?

Here are the exceptions I see:

DataNode:

2008-07-24 00:12:40,167 WARN org.apache.hadoop.dfs.DataNode: xxx:50010:Got 
exception while serving blk_3304550638094049
753 to /yyy:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.
SocketChannel[connected local=/xxx:50010 remote=/yyy:42542]
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:170)
        at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
        at 
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) 
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at 
org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
        at 
org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
        at 
org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039) 
        at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
        at java.lang.Thread.run(Thread.java:619)

DFS Client:

08/07/24 00:13:28 WARN dfs.DFSClient: Exception while reading from 
blk_3304550638094049753 of zzz from xxx:50010: java.io.IOException: Premeture 
EOF from inputStream
    at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
    at org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:967)
    at 
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)
    at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191)
    at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
    at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:829)
    at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1352)
    at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1388)
    at java.io.DataInputStream.read(DataInputStream.java:83)

08/07/24 00:13:28 INFO dfs.DFSClient: Could not obtain block 
blk_3304550638094049753 from any node:  java.io.IOException: No live nodes 
contain current block


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to