[ https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379964#comment-17379964 ]
Kihwal Lee commented on HDFS-16127: ----------------------------------- The proposed solution is to check the size of {{ackQueue}} when {{waitForAllAcks()}} for the final packet throws an {{IOException}}. If the queue is empty we can assume the last ack was received and the final packet for the block was removed from the queue, meaning no recovery is needed. > Improper pipeline close recovery causes a permanent write failure or data > loss. > ------------------------------------------------------------------------------- > > Key: HDFS-16127 > URL: https://issues.apache.org/jira/browse/HDFS-16127 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Priority: Major > > When a block is being closed, the data streamer in the client waits for the > final ACK to be delivered. If an exception is received during this wait, the > close is retried. This assumption has become invalid by HDFS-15813, resulting > in permanent write failures in some close error cases involving slow nodes. > There are also less frequent cases of data loss. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org