[ https://issues.apache.org/jira/browse/HADOOP-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883830#comment-13883830 ]
Laurent Goujon commented on HADOOP-10295: ----------------------------------------- Funny, I have been preparing a patch for this very same issue for a week. Some comments regarding your patch: * instead of a new commandline option, it may be better to extend FileAttribute enum * MD5MD5CRC32GzipFileChecksum and MD5MD5CRC32CastagnoliFileChecksum are probably HDFS specific (although being available in hadoop-common). I opened HADOOP-10297 for having {{FileChecksum.getChecksumOpt()}} * Instead of doing two instanceof check, it is possible to use the super class MD5MD5CRC32FileChecksum * EnumSet.of(CreateFlag.OVERWRITE) is not equivalent of setting overwrite argument to true. From DistributedFileSystem, it is EnumSet.of(CreateFlag.CREATE, CreateFlag.OVERWRITE) * Having a test to check if the option actually works would be a nice to have (according to me) Since I also have a patch, I'll attach it to this ticket to, and let have a hadoop maintainer help us sorting them out :) > Allow distcp to automatically identify the checksum type of source files and > use it for the target > -------------------------------------------------------------------------------------------------- > > Key: HADOOP-10295 > URL: https://issues.apache.org/jira/browse/HADOOP-10295 > Project: Hadoop Common > Issue Type: Improvement > Affects Versions: 2.2.0 > Reporter: Jing Zhao > Assignee: Jing Zhao > Attachments: HADOOP-10295.000.patch > > > Currently while doing distcp, users can use "-Ddfs.checksum.type" to specify > the checksum type in the target FS. This works fine if all the source files > are using the same checksum type. If files in the source cluster have mixed > types of checksum, users have to either use "-skipcrccheck" or have checksum > mismatching exception. Thus we may need to consider adding a new option to > distcp so that it can automatically identify the original checksum type of > each source file and use the same checksum type in the target FS. -- This message was sent by Atlassian JIRA (v6.1.5#6160)