[ 
https://issues.apache.org/jira/browse/HADOOP-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HADOOP-10295:
-------------------------------

    Attachment: HADOOP-10295.002.patch

Thanks for the comments, Kihwal and Sangjin! So this 002 patch is based on my 
001 patch and Laurent's patch, and it also preserve the block size when 
processing the preserving checksum type option. 

I've tested in my local cluster with the patch. In my test I simply generate 
some files with different checksum types, and run distcp with/without "-pc". 
The distcp succeeded when -pc is enabled.

> Allow distcp to automatically identify the checksum type of source files and 
> use it for the target
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-10295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10295
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.2.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HADOOP-10295.000.patch, HADOOP-10295.002.patch, 
> hadoop-10295.patch
>
>
> Currently while doing distcp, users can use "-Ddfs.checksum.type" to specify 
> the checksum type in the target FS. This works fine if all the source files 
> are using the same checksum type. If files in the source cluster have mixed 
> types of checksum, users have to either use "-skipcrccheck" or have checksum 
> mismatching exception. Thus we may need to consider adding a new option to 
> distcp so that it can automatically identify the original checksum type of 
> each source file and use the same checksum type in the target FS. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to