[
https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074611#comment-15074611
]
Kai Zheng commented on HDFS-8430:
---------------------------------
bq. Client should get CRCs from DNs, and sum at client side.
With all the CRCs for all the blocks in a group in hand (in client side), we
can break down these CRCs by crc checksum size into smaller units, then group
these units by striping cell size, reorder these groups like in replication
form and perform the MD5MD5 computation exactly as replication does, which
should be able to generate the same file checksum result. I thought this
approach is worth a try. There are two issues involved though:
* To generate a file checksum, we need to use block size. However a block size
value may make sense for replication form, but may be not for striping layout.
To compare a file in striping form with another in replication form via the
file checksum, we need to use the same block size value for the both form.
* The first level of CRC data for a block is needed to be retrieved from the
DN. The network traffic is bigger than the original MD5 value but I guess still
reasonable. We need to add a RPC method like {{getBlockCrcChecksum}} for the
retrieving. It's useful for striping, but may be only useful for replication or
other forms in future.
I'm prototyping in this way to verify the idea and show the possible changes,
for further discussion. Please help clarify if I missed anything. Thanks.
> Erasure coding: update DFSClient.getFileChecksum() logic for stripe files
> -------------------------------------------------------------------------
>
> Key: HDFS-8430
> URL: https://issues.apache.org/jira/browse/HDFS-8430
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: HDFS-7285
> Reporter: Walter Su
> Assignee: Kai Zheng
>
> HADOOP-3981 introduces a distributed file checksum algorithm. It's designed
> for replicated block.
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped
> block group.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)