[ 
https://issues.apache.org/jira/browse/HDFS-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073535#comment-15073535
 ] 

Walter Su commented on HDFS-8430:
---------------------------------

The MD5-of-xxxMD5-of-yyyCRC32 uses existing block metadata, so it depends on 
bytes.per.checksum and block size.
Now we have ec feature, so it depends on bytes.per.checksum, block size, and 
block layout.

bytes.per.checksum is less likely to change along with version upgrade. I think 
2 clusters of DistCp probably have the same bpc.
If we want to make it not affected by block size, and block layout, Client 
should get CRCs from DNs, and sum at client side. Well, it changes existing 
implementation and increases network traffic. If we can avoid tranfser files 
between clusters, I think we can bear the cost.

I haven't started. Please feel free to take it.

> Erasure coding: update DFSClient.getFileChecksum() logic for stripe files
> -------------------------------------------------------------------------
>
>                 Key: HDFS-8430
>                 URL: https://issues.apache.org/jira/browse/HDFS-8430
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7285
>            Reporter: Walter Su
>            Assignee: Walter Su
>
> HADOOP-3981 introduces a  distributed file checksum algorithm. It's designed 
> for replicated block.
> {{DFSClient.getFileChecksum()}} need some updates, so it can work for striped 
> block group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to