[ 
https://issues.apache.org/jira/browse/HADOOP-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883865#comment-13883865
 ] 

Laurent Goujon commented on HADOOP-10295:
-----------------------------------------

{quote}
I personally like your idea in HADOOP-10297. That can simplify the logic there. 
However, FileChecksum is a public API marked as stable, to add a new abstract 
method there may cause incompatibility (e.g., other ppl may have implemented 
their own FileChecksum). A workaround here can be adding getChecksumOpt() to 
FileChecksum and let it return null.
{quote}
Yes, my patch for HADOOP-10297 breaks source compatibility (but not binary 
compatibility). It may be okay for next Hadoop major version, but probably not 
for a minor version. Waiting for some guidance here (and it's really easy to 
change)

{quote}
I thought about this problem. To me checksum type may be a little bit different 
from other file attributes, since other file attributes are all metadata stored 
in NN. Thus in my first patch I just add a new option. But now I think to put 
the checksum type in the FileAttribute enum should be more clear.
{quote}
>From the user point of view, block size, replication and checksum option are 
>seen as the same kind of metadata. Only from the FileSystem API, it is seen as 
>different kind of metadata because the information is not stored in the same 
>place.

{quote}
Currently I have a 001 patch which fixes the CreateFlag bug and adds a unit 
test. My original plan is to post it after I finish system test in my local 
cluster. But since you've worked on this issue for some time and already have a 
decent patch, I'd like to review your patch and commit it when it is ready. 
{quote}
My patch is mostly ready I think, but it is blocked by the other tickets I 
mentioned. Hopefully they will be reviewed quickly.

> Allow distcp to automatically identify the checksum type of source files and 
> use it for the target
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-10295
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10295
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 2.2.0
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HADOOP-10295.000.patch, hadoop-10295.patch
>
>
> Currently while doing distcp, users can use "-Ddfs.checksum.type" to specify 
> the checksum type in the target FS. This works fine if all the source files 
> are using the same checksum type. If files in the source cluster have mixed 
> types of checksum, users have to either use "-skipcrccheck" or have checksum 
> mismatching exception. Thus we may need to consider adding a new option to 
> distcp so that it can automatically identify the original checksum type of 
> each source file and use the same checksum type in the target FS. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to