[
https://issues.apache.org/jira/browse/HDFS-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kihwal Lee resolved HDFS-5032.
------------------------------
Resolution: Fixed
> Write pipeline failures caused by slow or busy disk may not be handled
> properly.
> --------------------------------------------------------------------------------
>
> Key: HDFS-5032
> URL: https://issues.apache.org/jira/browse/HDFS-5032
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.1.0-beta, 0.23.9
> Reporter: Kihwal Lee
> Assignee: Daryn Sharp
>
> Here is one scenario I have recently encountered in a hbase cluster.
> The 1st datanode in a write pipeline's disk became extremely busy for many
> minutes and it caused block writes on the disk to slow down. The 2nd
> datanode's socket read from the 1st datanode timed out in 60 seconds and
> disconnected. This caused a block recovery. The problem was, the 1st datanode
> hasn't written the last packet, but the downstream nodes did and ACK was sent
> back to the client. For this reason, the block recovery was issued up to the
> ACKed size.
> During the recovery, the first datanode was told to do copyBlock(). Since it
> didn't have enough data on disk, it waited in waitForMinLength(), which
> didn't help, so the command failed. The connection was already established to
> the target node for the copy, but the target never received any data. The
> data packet was eventually written, but it was too late for the copyBlock()
> call.
> The destination node for the copy had block metadata in memory, but no file
> was created on disk. When client contacted this node for block recovery, it
> too failed.
> There are few problems:
> - The faulty (slow) node was not detected correctly. Instead, the 2nd DN was
> excluded. The 1st DN's packet responder could have done a better job. It
> didn't have any outstanding ACKs to receive. Or the second DN could have
> tried to hint the 1st DN of what happened.
> - copyBlock() could probably wait longer than 3 seconds in
> waitForMinLength(). Or it could check the on-disk size early on and fail
> early even before trying to establish a connection to the target.
> - Failed targets in block write/copy should clean up the record or make it
> recoverable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)