[ 
https://issues.apache.org/jira/browse/HDFS-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866329#action_12866329
 ] 

Todd Lipcon commented on HDFS-1103:
-----------------------------------

I worked on this a bit, and think we have a few options:

1) Assume that corruption of the last checksum chunk always represents an 
invalid checksum and possibly-truncated data, but the bytes themselves are OK 
in the data file.

In this case, we can use the following algorithm:
- For each replica, run validateIntegrity to determine the length of bytes that 
fit the checksum
- Determine the maximum valid length across all replicas
- For any replica that has at least that number of bytes on disk (regardless of 
validity), include it in the synchronization

2) Assume that chunks that aren't validated are corrupt in either the checksum 
or the data

In this case, we take the max of the valid lengths, and only include blocks in 
synchronization when they have at least this many valid bytes. Thus, in the 
case where we have two replicas, one of which is potentially corrupt at the 
end, we'd end up only keeping the valid one.

We can still lose flushed data in the case of multiple datanode failure, if 
both of the DNs end up with corrupt last chunks (unlikely but possible)

3) Ignore checksums completely for the last chunk

We can entirely throw away the checksums on the last chunk, and augment 
ReplicaRecoveryInfo to actually include a BytesWritable of the last data chunk 
(only 512 bytes by default).

During the recovery operation, we can simply find the common prefix of data 
across all of the replicas, and synchronize to that length (recomputing 
checksum on each node).

Since we're only talking about one checksum chunk, and these are known to be 
small, I don't think it's problematic to shove the bytes in an RPC.


Thoughts?

> Replica recovery doesn't distinguish between flushed-but-corrupted last chunk 
> and unflushed last chunk
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1103
>                 URL: https://issues.apache.org/jira/browse/HDFS-1103
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>         Attachments: hdfs-1103-test.txt
>
>
> When the DN creates a replica under recovery, it calls validateIntegrity, 
> which truncates the last checksum chunk off of a replica if it is found to be 
> invalid. Then when the block recovery process happens, this shortened block 
> wins over a longer replica from another node where there was no corruption. 
> Thus, if just one of the DNs has an invalid last checksum chunk, data that 
> has been sync()ed to other datanodes can be lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to