[jira] Updated: (HADOOP-3831) slow-reading dfs clients do not recover from datanode-write-timeouts

Raghu Angadi (JIRA) Wed, 10 Sep 2008 15:25:08 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Raghu Angadi updated HADOOP-3831:
---------------------------------

    Attachment: HADOOP-3831.patch

Hairong, updated patch adds this comment. Does this help ?:


{code}[...]
     /* we retry current node only once. So this is set to true just here.
       * Intention is to handle one common case of an error that is not a
       * failure on datanode or client : when DataNode closes the connection
       * since client is idle. If there are other cases of "non-errors" then
       * then a datanode might be retried by setting this to true again.
       */
 {code}

> slow-reading dfs clients do not recover from datanode-write-timeouts
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3831
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3831
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.1
>            Reporter: Christian Kunz
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3831.patch, HADOOP-3831.patch, HADOOP-3831.patch, 
> HADOOP-3831.patch
>
>
> Some of our applications read through certain files from dfs (using libhdfs) 
> much slower than through others, such that they trigger the write timeout 
> introduced in 0.17.x into the datanodes. Eventually they fail.
> Dfs clients should be able to recover from such a situation.
> In the meantime, would setting
> dfs.datanode.socket.write.timeout=0
> in hadoop-site.xml help?
> Here are the exceptions I see:
> DataNode:
> 2008-07-24 00:12:40,167 WARN org.apache.hadoop.dfs.DataNode: xxx:50010:Got 
> exception while serving blk_3304550638094049
> 753 to /yyy:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for 
> channel to be ready for write. ch : java.nio.channels.
> SocketChannel[connected local=/xxx:50010 remote=/yyy:42542]
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:170)
>         at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
>         at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
>         at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) 
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>         at 
> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
>         at 
> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
>         at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039) 
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
>         at java.lang.Thread.run(Thread.java:619)
> DFS Client:
> 08/07/24 00:13:28 WARN dfs.DFSClient: Exception while reading from 
> blk_3304550638094049753 of zzz from xxx:50010: java.io.IOException: Premeture 
> EOF from inputStream
>     at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
>     at 
> org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:967)
>     at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)
>     at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191)
>     at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
>     at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:829)
>     at 
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1352)
>     at 
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1388)
>     at java.io.DataInputStream.read(DataInputStream.java:83)
> 08/07/24 00:13:28 INFO dfs.DFSClient: Could not obtain block 
> blk_3304550638094049753 from any node:  java.io.IOException: No live nodes 
> contain current block

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3831) slow-reading dfs clients do not recover from datanode-write-timeouts

Reply via email to