[jira] [Commented] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

Kihwal Lee (Jira) Tue, 13 Jul 2021 08:23:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379964#comment-17379964
 ]


Kihwal Lee commented on HDFS-16127:
-----------------------------------

The proposed solution is to check the size of {{ackQueue}} when 
{{waitForAllAcks()}} for the final packet throws an {{IOException}}. If the 
queue is empty we can assume the last ack was received and the final packet for 
the block was removed from the queue, meaning no recovery is needed.

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-16127
>                 URL: https://issues.apache.org/jira/browse/HDFS-16127
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Priority: Major
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

Reply via email to