[ 
https://issues.apache.org/jira/browse/HDFS-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867366#action_12867366
 ] 

Todd Lipcon commented on HDFS-1103:
-----------------------------------

Example scenario that would cause this issue:

1. Client A writing to DN A -> DN B -> DN C (the client and the first DN are on 
the same machine)
2. Machine A crashes (so both DN and client die at the same time)
[last chunk of the replica on A is left corrupt because it died mid-write, or 
journal got lost, whatever]
3. Machine A reboots, DN comes back up
4. validateIntegrity truncates the block to the previous checksum chunk boundary
5. NN lease expires, and NN triggers recovery
6. len(DN A) < len(others) but with same generation stamp, so synced data is 
lost during recovery

Even without the inconsistent checksum at the end of the file, we're likely to 
lose data in this case since we don't actually ever call fsync(). So the 
replica on DN A is likely to be significantly truncated compared to the 
replicas on B and C.

> Replica recovery doesn't distinguish between flushed-but-corrupted last chunk 
> and unflushed last chunk
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1103
>                 URL: https://issues.apache.org/jira/browse/HDFS-1103
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>         Attachments: hdfs-1103-test.txt
>
>
> When the DN creates a replica under recovery, it calls validateIntegrity, 
> which truncates the last checksum chunk off of a replica if it is found to be 
> invalid. Then when the block recovery process happens, this shortened block 
> wins over a longer replica from another node where there was no corruption. 
> Thus, if just one of the DNs has an invalid last checksum chunk, data that 
> has been sync()ed to other datanodes can be lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to