[ https://issues.apache.org/jira/browse/HDFS-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12935560#action_12935560 ]
Thanh Do commented on HDFS-1103: -------------------------------- "I do not think that this is the case in 0.21 & the trunk. In our lease recovery algorithm in 0.21, If there are 2 RBWs and 1 RWR, 1 RWR is excluded from the lease recovery. In the scenario that you described, RBW B and RBW C's GS is bumped and the length of recovered two replicas is truncated to MIN( len(B), len(C)). " Hairong, can you explain to me that why RBW B and RBW C's GS are bumped up. Is that because of the lease recovery protocol? But from my understanding, from Todd description, NN lease recovery is trigger after Machine A report... > Replica recovery doesn't distinguish between flushed-but-corrupted last chunk > and unflushed last chunk > ------------------------------------------------------------------------------------------------------ > > Key: HDFS-1103 > URL: https://issues.apache.org/jira/browse/HDFS-1103 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node > Affects Versions: 0.21.0, 0.22.0 > Reporter: Todd Lipcon > Priority: Blocker > Attachments: hdfs-1103-test.txt > > > When the DN creates a replica under recovery, it calls validateIntegrity, > which truncates the last checksum chunk off of a replica if it is found to be > invalid. Then when the block recovery process happens, this shortened block > wins over a longer replica from another node where there was no corruption. > Thus, if just one of the DNs has an invalid last checksum chunk, data that > has been sync()ed to other datanodes can be lost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.