[
https://issues.apache.org/jira/browse/HDFS-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881469#comment-16881469
]
Yongjun Zhang commented on HDFS-9178:
-------------------------------------
HI [~kihwal], many thanks for the work here!
> Slow datanode I/O can cause a wrong node to be marked bad
> ---------------------------------------------------------
>
> Key: HDFS-9178
> URL: https://issues.apache.org/jira/browse/HDFS-9178
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Priority: Critical
> Fix For: 2.8.0, 2.7.2, 2.6.4, 3.0.0-alpha1
>
> Attachments: 002-HDFS-9178.branch-2.6.patch,
> HDFS-9178.branch-2.6.patch, HDFS-9178.patch
>
>
> When non-leaf datanode in a pipeline is slow on or stuck at disk I/O, the
> downstream node can timeout on reading packet since even the heartbeat
> packets will not be relayed down.
> The packet read timeout is set in {{DataXceiver#run()}}:
> {code}
> peer.setReadTimeout(dnConf.socketTimeout);
> {code}
> When the downstream node times out and closes the connection to the upstream,
> the upstream node's {{PacketResponder}} gets {{EOFException}} and it sends an
> ack upstream with the downstream node status set to {{ERROR}}. This caused
> the client to exclude the downstream node, even though the upstream node was
> the one got stuck.
> The connection to downstream has longer timeout, so the downstream will
> always timeout first. The downstream timeout is set in {{writeBlock()}}
> {code}
> int timeoutValue = dnConf.socketTimeout +
> (HdfsConstants.READ_TIMEOUT_EXTENSION * targets.length);
> int writeTimeout = dnConf.socketWriteTimeout +
> (HdfsConstants.WRITE_TIMEOUT_EXTENSION * targets.length);
> NetUtils.connect(mirrorSock, mirrorTarget, timeoutValue);
> OutputStream unbufMirrorOut = NetUtils.getOutputStream(mirrorSock,
> writeTimeout);
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]