[ 
https://issues.apache.org/jira/browse/HADOOP-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883830#comment-13883830
 ] 

Laurent Goujon commented on HADOOP-10295:
-----------------------------------------

Funny, I have been preparing a patch for this very same issue for a week.

Some comments regarding your patch:
* instead of a new commandline option, it may be better to extend FileAttribute 
enum
* MD5MD5CRC32GzipFileChecksum and MD5MD5CRC32CastagnoliFileChecksum are 
probably HDFS specific (although being available in hadoop-common). I opened 
HADOOP-10297 for having {{FileChecksum.getChecksumOpt()}}
* Instead of doing two instanceof check, it is possible to use the super class 
MD5MD5CRC32FileChecksum
* EnumSet.of(CreateFlag.OVERWRITE) is not equivalent of setting overwrite 
argument to true. From DistributedFileSystem, it is 
EnumSet.of(CreateFlag.CREATE, CreateFlag.OVERWRITE)
* Having a test to check if the option actually works would be a nice to have 
(according to me)

Since I also have a patch, I'll attach it to this ticket to, and let have a 
hadoop maintainer help us sorting them out :) 

> Allow distcp to automatically identify the checksum type of source files and 
> use it for the target
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-10295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10295
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.2.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HADOOP-10295.000.patch
>
>
> Currently while doing distcp, users can use "-Ddfs.checksum.type" to specify 
> the checksum type in the target FS. This works fine if all the source files 
> are using the same checksum type. If files in the source cluster have mixed 
> types of checksum, users have to either use "-skipcrccheck" or have checksum 
> mismatching exception. Thus we may need to consider adding a new option to 
> distcp so that it can automatically identify the original checksum type of 
> each source file and use the same checksum type in the target FS. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to