[
https://issues.apache.org/jira/browse/HDFS-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268438#comment-15268438
]
Kai Zheng commented on HDFS-9833:
---------------------------------
Thanks Rakesh for your understanding. It may help to explain about this in
other words according to mine.
You're right in client side we need to try each datanode in the group and let
it do the block group checksum computing. It includes datanodes of both data
blocks and and parity blocks because parity block datanodes can also do the
same work. Anyway when a datanode in the group is requested to do the computing
work, it will request/collect all the checksums for the blocks in the group to
compute the block group level checksum to respond to the client call. When all
the blocks are fine the existing block checksums are just requested
remotely/locally and used, but in case some data block is erased, the similar
reconstruction task will be executed on the requested datanode to recompute the
block checksum on the fly. Anyway when it fails then it will return failure to
the client instead of the normal block group checksum. When the client receives
failure it means the requested datanode isn't able to do the work so it will
retry with next datanode in the group.
> Erasure coding: recomputing block checksum on the fly by reconstructing the
> missed/corrupt block data
> -----------------------------------------------------------------------------------------------------
>
> Key: HDFS-9833
> URL: https://issues.apache.org/jira/browse/HDFS-9833
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Kai Zheng
> Assignee: Rakesh R
> Labels: hdfs-ec-3.0-must-do
>
> As discussed in HDFS-8430 and HDFS-9694, to compute striped file checksum
> even some of striped blocks are missed, we need to consider recomputing block
> checksum on the fly for the missed/corrupt blocks. To recompute the block
> checksum, the block data needs to be reconstructed by erasure decoding, and
> the main needed codes for the block reconstruction could be borrowed from
> HDFS-9719, the refactoring of the existing {{ErasureCodingWorker}}. In EC
> worker, reconstructed blocks need to be written out to target datanodes, but
> here in this case, the remote writing isn't necessary, as the reconstructed
> block data is only used to recompute the checksum.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]