[
https://issues.apache.org/jira/browse/HADOOP-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raghu Angadi updated HADOOP-3831:
---------------------------------
Attachment: HADOOP-3831.patch
Hairong, updated patch adds this comment. Does this help ?:
{code}[...]
/* we retry current node only once. So this is set to true just here.
* Intention is to handle one common case of an error that is not a
* failure on datanode or client : when DataNode closes the connection
* since client is idle. If there are other cases of "non-errors" then
* then a datanode might be retried by setting this to true again.
*/
{code}
> slow-reading dfs clients do not recover from datanode-write-timeouts
> --------------------------------------------------------------------
>
> Key: HADOOP-3831
> URL: https://issues.apache.org/jira/browse/HADOOP-3831
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.17.1
> Reporter: Christian Kunz
> Assignee: Raghu Angadi
> Attachments: HADOOP-3831.patch, HADOOP-3831.patch, HADOOP-3831.patch,
> HADOOP-3831.patch
>
>
> Some of our applications read through certain files from dfs (using libhdfs)
> much slower than through others, such that they trigger the write timeout
> introduced in 0.17.x into the datanodes. Eventually they fail.
> Dfs clients should be able to recover from such a situation.
> In the meantime, would setting
> dfs.datanode.socket.write.timeout=0
> in hadoop-site.xml help?
> Here are the exceptions I see:
> DataNode:
> 2008-07-24 00:12:40,167 WARN org.apache.hadoop.dfs.DataNode: xxx:50010:Got
> exception while serving blk_3304550638094049
> 753 to /yyy:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write. ch : java.nio.channels.
> SocketChannel[connected local=/xxx:50010 remote=/yyy:42542]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:170)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at
> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
> at
> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
> at
> org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
> at java.lang.Thread.run(Thread.java:619)
> DFS Client:
> 08/07/24 00:13:28 WARN dfs.DFSClient: Exception while reading from
> blk_3304550638094049753 of zzz from xxx:50010: java.io.IOException: Premeture
> EOF from inputStream
> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
> at
> org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:967)
> at
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)
> at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191)
> at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
> at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:829)
> at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1352)
> at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1388)
> at java.io.DataInputStream.read(DataInputStream.java:83)
> 08/07/24 00:13:28 INFO dfs.DFSClient: Could not obtain block
> blk_3304550638094049753 from any node: java.io.IOException: No live nodes
> contain current block
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.