slow-reading dfs clients do not recover from datanode-write-timeouts
--------------------------------------------------------------------
Key: HADOOP-3831
URL: https://issues.apache.org/jira/browse/HADOOP-3831
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.17.1
Reporter: Christian Kunz
Some of our applications read through certain files from dfs (using libhdfs)
much slower than through others, such that they trigger the write timeout
introduced in 0.17.x into the datanodes. Eventually they fail.
Dfs clients should be able to recover from such a situation.
In the meantime, would setting
dfs.datanode.socket.write.timeout=0
in hadoop-site.xml help?
Here are the exceptions I see:
DataNode:
2008-07-24 00:12:40,167 WARN org.apache.hadoop.dfs.DataNode: xxx:50010:Got
exception while serving blk_3304550638094049
753 to /yyy:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch : java.nio.channels.
SocketChannel[connected local=/xxx:50010 remote=/yyy:42542]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:170)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at
org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
at
org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
at
org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
at java.lang.Thread.run(Thread.java:619)
DFS Client:
08/07/24 00:13:28 WARN dfs.DFSClient: Exception while reading from
blk_3304550638094049753 of zzz from xxx:50010: java.io.IOException: Premeture
EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
at org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:967)
at
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:829)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1352)
at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1388)
at java.io.DataInputStream.read(DataInputStream.java:83)
08/07/24 00:13:28 INFO dfs.DFSClient: Could not obtain block
blk_3304550638094049753 from any node: java.io.IOException: No live nodes
contain current block
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.