[ 
https://issues.apache.org/jira/browse/HDFS-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975142#comment-14975142
 ] 

Kihwal Lee commented on HDFS-5032:
----------------------------------

bq. The faulty (slow) node was not detected correctly. Instead, the 2nd DN was 
excluded. The 1st DN's packet responder could have done a better job. It didn't 
have any outstanding ACKs to receive. Or the second DN could have tried to hint 
the 1st DN of what happened.

Fixed by HDFS-9178. Absence of heartbeat during flush will be fixed in a 
separate jira by [~daryn]

bq. copyBlock() could probably wait longer than 3 seconds in 
waitForMinLength(). Or it could check the on-disk size early on and fail early 
even before trying to establish a connection to the target.

If the node stuck in I/O is correctly taken out, this will happen far less. 
Also, HDFS-9106 will make this kind of failure non-fatal.

bq. Failed targets in block write/copy should clean up the record or make it 
recoverable.

Fixed in HDFS-6948.

> Write pipeline failures caused by slow or busy disk may not be handled 
> properly.
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-5032
>                 URL: https://issues.apache.org/jira/browse/HDFS-5032
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.1.0-beta, 0.23.9
>            Reporter: Kihwal Lee
>            Assignee: Daryn Sharp
>
> Here is one scenario I have recently encountered in a hbase cluster.
> The 1st datanode in a write pipeline's disk became extremely busy for many 
> minutes and it caused block writes on the disk to slow down. The 2nd 
> datanode's socket read from the 1st datanode timed out in 60 seconds and 
> disconnected. This caused a block recovery. The problem was, the 1st datanode 
> hasn't written the last packet, but the downstream nodes did and ACK was sent 
> back to the client. For this reason, the block recovery was issued up to the 
> ACKed size. 
> During the recovery, the first datanode was told to do copyBlock(). Since it 
> didn't have enough data on disk, it waited in waitForMinLength(), which 
> didn't help, so the command failed. The connection was already established to 
> the target node for the copy, but the target never received any data. The 
> data packet was eventually written, but it was too late for the copyBlock() 
> call.
> The destination node for the copy had block metadata in memory, but no file 
> was created on disk. When client contacted this node for block recovery, it 
> too failed. 
> There are few problems:
> - The faulty (slow) node was not detected correctly. Instead, the 2nd DN was 
> excluded. The 1st DN's packet responder could have done a better job. It 
> didn't have any outstanding ACKs to receive.  Or the second DN could have 
> tried to hint the 1st DN of what happened. 
> - copyBlock() could probably wait longer than 3 seconds in 
> waitForMinLength(). Or it could check the on-disk size early on and fail 
> early even before trying to establish a connection to the target.
> - Failed targets in block write/copy should clean up the record or make it 
> recoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to