[ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504260#comment-13504260
 ] 

Kihwal Lee commented on HDFS-3875:
----------------------------------

I don't think calling reportBadBlocks() alone does any good. Without client 
knowing details of a corruption, it won't be able to recover the block 
properly. reportBadBlocks() during create is only useful when a corruption is 
confined to one replica. If we get the in-line corruption detection and 
recovery right, this call will not be needed during write operations.

If the meaning of response in the data packet transfer is to be extended to 
cover packet corruption,

* A tail node should not ACK until the checksum of a packet is verified. 
Currently, an ack is enqueued before verifying checksum, which in case of tail 
node causes immediate transmission of ACK/SUCCESS.

* When the tail node is dropped from a pipeline, other nodes should not simply 
ack with success since that would mean checksum was okay on those nodes. 

* The portions that were ACK'ed with SUCCESS are guaranteed to be not corrupt. 
To be precise, there can be corruption on disk due to local issues, but not in 
the data each datanode received. I.e. any on-disk corruption must be an 
isolated corruption, not caused by propagated corruption.

For the second point, we could have datanodes to verify checksums when they 
lose the mirror node or explicitly get ACK/CORRUPTION. But this can be 
simplified if we can guarantee that no ACK/SUCCESS is sent back when a 
corruption is detected in the packet or the mirror node is lost. We can just 
drop the portion of data by not ACKing the corrupt packet or sending 
ACK/CORRUPTION back for it. I think client will redo the un-ACK'ed packet in 
this case. 

The worst case is rewriting some packets. But advantage is simplicity and 
avoiding checksum verification of written data.
                
> Issue handling checksum errors in write pipeline
> ------------------------------------------------
>
>                 Key: HDFS-3875
>                 URL: https://issues.apache.org/jira/browse/HDFS-3875
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, hdfs client
>    Affects Versions: 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Priority: Blocker
>
> We saw this issue with one block in a large test cluster. The client is 
> storing the data with replication level 2, and we saw the following:
> - the second node in the pipeline detects a checksum error on the data it 
> received from the first node. We don't know if the client sent a bad 
> checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
> - this caused the second node to get kicked out of the pipeline, since it 
> threw an exception. The pipeline started up again with only one replica (the 
> first node in the pipeline)
> - this replica was later determined to be corrupt by the block scanner, and 
> unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to