[jira] [Commented] (HDFS-9833) Erasure coding: recomputing block checksum on the fly by reconstructing the missed/corrupt block data

Kai Zheng (JIRA) Tue, 03 May 2016 02:46:26 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268438#comment-15268438
 ]


Kai Zheng commented on HDFS-9833:
---------------------------------

Thanks Rakesh for your understanding. It may help to explain about this in 
other words according to mine.

You're right in client side we need to try each datanode in the group and let 
it do the block group checksum computing. It includes datanodes of both data 
blocks and and parity blocks because parity block datanodes can also do the 
same work. Anyway when a datanode in the group is requested to do the computing 
work, it will request/collect all the checksums for the blocks in the group to 
compute the block group level checksum to respond to the client call. When all 
the blocks are fine the existing block checksums are just requested 
remotely/locally and used, but in case some data block is erased, the similar 
reconstruction task will be executed on the requested datanode to recompute the 
block checksum on the fly. Anyway when it fails then it will return failure to 
the client instead of the normal block group checksum. When the client receives 
failure it means the requested datanode isn't able to do the work so it will 
retry with next datanode in the group.

> Erasure coding: recomputing block checksum on the fly by reconstructing the 
> missed/corrupt block data
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9833
>                 URL: https://issues.apache.org/jira/browse/HDFS-9833
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Rakesh R
>              Labels: hdfs-ec-3.0-must-do
>
> As discussed in HDFS-8430 and HDFS-9694, to compute striped file checksum 
> even some of striped blocks are missed, we need to consider recomputing block 
> checksum on the fly for the missed/corrupt blocks. To recompute the block 
> checksum, the block data needs to be reconstructed by erasure decoding, and 
> the main needed codes for the block reconstruction could be borrowed from 
> HDFS-9719, the refactoring of the existing {{ErasureCodingWorker}}. In EC 
> worker, reconstructed blocks need to be written out to target datanodes, but 
> here in this case, the remote writing isn't necessary, as the reconstructed 
> block data is only used to recompute the checksum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-9833) Erasure coding: recomputing block checksum on the fly by reconstructing the missed/corrupt block data

Reply via email to