[
https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081045#comment-15081045
]
Kai Zheng commented on HDFS-8430:
---------------------------------
See. So the cell level checksum instead of the block level one would produce
even more intermediate checksum data. Either way, as Walter said above, we
would still need to get all the cell/block checksum results in hand so sum them
in the desired order even the used code is linear to generate the same final
result. You mentioned we can consider to use a DataNode for the final result
computing, but it looks like a little overkill for this functionality, or I
missed anything here? Is there any existing facility like this to support doing
the way? The client instructs other 5 DataNodes to send some data to the lead
DataNode and the lead aggregates the result to respond the client's request.
Thanks.
> Erasure coding: update DFSClient.getFileChecksum() logic for stripe files
> -------------------------------------------------------------------------
>
> Key: HDFS-8430
> URL: https://issues.apache.org/jira/browse/HDFS-8430
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: HDFS-7285
> Reporter: Walter Su
> Assignee: Kai Zheng
> Attachments: HDFS-8430-poc1.patch
>
>
> HADOOP-3981 introduces a distributed file checksum algorithm. It's designed
> for replicated block.
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped
> block group.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)