[ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503875#comment-13503875
 ] 

Kihwal Lee commented on HDFS-3875:
----------------------------------

This sounds like the symptom I mentioned in HDFS-3874. The tail node in the 
pipeline containing three detected a corruption, but it's report failed due to 
HDFS-3874 and it just went away. Since the last of the three in the pipeline 
just disappeared, the corrupt packet was acked with {SUCCESS, SUCCESS, FAIL}. 
So recreation of pipeline using the remaining two ended up containing the 
corrupt portion of data.

bq. Depending on the above, it would report back the errorIndex appropriately 
to the client, so that the correct faulty node is removed from the pipeline.

* This should cover the cases where a particular datanode corrupting data IF 
the client checksum and storage checksum method is identical.

* If the two checksum methods are different, datanodes would have recalculated 
and wrote out data along with their own checksum. Even if incoming data was 
corrupt, it would appear okay on disk of these nodes. The tail node can detect 
corruption, but if it somehow terminates or get ignored, there is no 
retrospective scan that will tell us the integrity of the stored block, since 
the checksum may have been recreated to match the corrupted data.  Maybe we 
should force datanodes to verify checksum if the two checksums types are 
different.

* Even if we don't have the above issue, a special handling is needed for the 
case where client is corrupting data. After recreating a pipeline, the same 
thing will repeat since client moves un-acked packets to its data queue and 
resend. Fail after trying twice? Or may be the client should do self integrity 
check of the packets in the ack queue if a corruption is present in the first 
datanode.

* How will it work with reportBadBlocks() being called by the last node in the 
pipeline? The semantics of this method does not seem compatible with the blocks 
being actively written and could be recovered by calling recoverRbw().

* Given all these issues, simply failing/abandoning block may be the easiest 
way out without missing any other possible corner cases.  This will be even 
more convincing if we have any evidence showing that client-side corruption is 
the most common cause. 
                
> Issue handling checksum errors in write pipeline
> ------------------------------------------------
>
>                 Key: HDFS-3875
>                 URL: https://issues.apache.org/jira/browse/HDFS-3875
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client
>    Affects Versions: 2.0.2-alpha
>            Reporter: Todd Lipcon
>
> We saw this issue with one block in a large test cluster. The client is 
> storing the data with replication level 2, and we saw the following:
> - the second node in the pipeline detects a checksum error on the data it 
> received from the first node. We don't know if the client sent a bad 
> checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
> - this caused the second node to get kicked out of the pipeline, since it 
> threw an exception. The pipeline started up again with only one replica (the 
> first node in the pipeline)
> - this replica was later determined to be corrupt by the block scanner, and 
> unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to