[
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504260#comment-13504260
]
Kihwal Lee commented on HDFS-3875:
----------------------------------
I don't think calling reportBadBlocks() alone does any good. Without client
knowing details of a corruption, it won't be able to recover the block
properly. reportBadBlocks() during create is only useful when a corruption is
confined to one replica. If we get the in-line corruption detection and
recovery right, this call will not be needed during write operations.
If the meaning of response in the data packet transfer is to be extended to
cover packet corruption,
* A tail node should not ACK until the checksum of a packet is verified.
Currently, an ack is enqueued before verifying checksum, which in case of tail
node causes immediate transmission of ACK/SUCCESS.
* When the tail node is dropped from a pipeline, other nodes should not simply
ack with success since that would mean checksum was okay on those nodes.
* The portions that were ACK'ed with SUCCESS are guaranteed to be not corrupt.
To be precise, there can be corruption on disk due to local issues, but not in
the data each datanode received. I.e. any on-disk corruption must be an
isolated corruption, not caused by propagated corruption.
For the second point, we could have datanodes to verify checksums when they
lose the mirror node or explicitly get ACK/CORRUPTION. But this can be
simplified if we can guarantee that no ACK/SUCCESS is sent back when a
corruption is detected in the packet or the mirror node is lost. We can just
drop the portion of data by not ACKing the corrupt packet or sending
ACK/CORRUPTION back for it. I think client will redo the un-ACK'ed packet in
this case.
The worst case is rewriting some packets. But advantage is simplicity and
avoiding checksum verification of written data.
> Issue handling checksum errors in write pipeline
> ------------------------------------------------
>
> Key: HDFS-3875
> URL: https://issues.apache.org/jira/browse/HDFS-3875
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node, hdfs client
> Affects Versions: 2.0.2-alpha
> Reporter: Todd Lipcon
> Priority: Blocker
>
> We saw this issue with one block in a large test cluster. The client is
> storing the data with replication level 2, and we saw the following:
> - the second node in the pipeline detects a checksum error on the data it
> received from the first node. We don't know if the client sent a bad
> checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
> - this caused the second node to get kicked out of the pipeline, since it
> threw an exception. The pipeline started up again with only one replica (the
> first node in the pipeline)
> - this replica was later determined to be corrupt by the block scanner, and
> unrecoverable since it is the only replica
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira