[
https://issues.apache.org/jira/browse/HDFS-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159609#comment-14159609
]
Yongjun Zhang commented on HDFS-3342:
-------------------------------------
Hi [~tlipcon],
The reason I'm looking at this issue is, this is still happening in recent use
and confused user. Would you please help review the patch? Thaks a lot.
I was able to reproduce the issue and see the log:
{code}
14/10/04 21:12:04 INFO datanode.DataNode: Failed to send data:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/172.17.186.17:42010 remote=/172.17.186.17:60227]
14/10/04 21:12:04 WARN datanode.DataNode: DatanodeRegistration(172.17.186.17,
datanodeUuid=95f0a627-b010-453b-a432-c147d012c814, infoPort=42075,
ipcPort=42022,
storageInfo=lv=-56;cid=CID-13a9b341-3a15-405e-8d07-a719ec9be2ac;nsid=1866275128;c=0):Got
exception while serving
BP-326257059-172.17.186.17-1412481724026:blk_1073741825_1001 to
/172.17.186.17:60227
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/172.17.186.17:42010 remote=/172.17.186.17:60227]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
at java.lang.Thread.run(Thread.java:724)
14/10/04 21:12:04 ERROR datanode.DataNode:
haus03.sjc.cloudera.com:42010:DataXceiver error processing READ_BLOCK operation
src: /172.17.186.17:60227 dst: /172.17.186.17:42010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/172.17.186.17:42010 remote=/172.17.186.17:60227]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
at java.lang.Thread.run(Thread.java:724)
{code}
I found that the top portion
{code}
14/10/04 21:12:04 INFO datanode.DataNode: Failed to send data:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/172.17.186.17:42010 remote=/172.17.186.17:60227]
{code}
was introduced by HDFS-3555 for same issue. But the fix there still thows
exception, which is not handled, thus we are seeing the reported error.
I'm submitting a patch that made the output
{code}
14/10/05 10:56:57 INFO datanode.DataNode: Failed to send data:
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/172.17.186.17:42010 remote=/172.17.186.17:41933]
14/10/05 10:56:57 WARN datanode.DataNode: DatanodeRegistration(172.17.186.17,
datanodeUuid=2a87010c-c9fe-4b4e-a249-0d8bf11a8f41, infoPort=42075,
ipcPort=42022,
storageInfo=lv=-56;cid=CID-ba2f8c8b-7e49-4514-b74c-201c1e9508ad;nsid=1860548702;c=0):Got
exception while serving
BP-269685814-172.17.186.17-1412528857362:blk_1073741825_1001 to
/172.17.186.17:41933
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/172.17.186.17:42010 remote=/172.17.186.17:41933]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:550)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:730)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:677)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:490)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
at java.lang.Thread.run(Thread.java:724)
14/10/05 10:56:57 INFO datanode.DataNode: Likely the client has stopped
reading, disconnecting it (haus03.sjc.cloudera.com:42010:DataXceiver error
processing READ_BLOCK operation src: /172.17.186.17:41933 dst:
/172.17.186.17:42010; java.net.SocketTimeoutException: 480000 millis timeout
while waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/172.17.186.17:42010
remote=/172.17.186.17:41933])
{code}
> SocketTimeoutException in BlockSender.sendChunks could have a better error
> message
> ----------------------------------------------------------------------------------
>
> Key: HDFS-3342
> URL: https://issues.apache.org/jira/browse/HDFS-3342
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Affects Versions: 2.0.0-alpha
> Reporter: Todd Lipcon
> Assignee: Yongjun Zhang
> Priority: Minor
>
> Currently, if a client connects to a DN and begins to read a block, but then
> stops calling read() for a long period of time, the DN will log a
> SocketTimeoutException "480000 millis timeout while waiting for channel to be
> ready for write." This is because there is no "keepalive" functionality of
> any kind. At a minimum, we should improve this error message to be an INFO
> level log which just says that the client likely stopped reading, so
> disconnecting it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)