[ 
https://issues.apache.org/jira/browse/HADOOP-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HADOOP-10608:
-------------------------------

    Attachment: HADOOP-10608.001.patch

Thanks for the review, [~szetszwo]! Update the patch to address your comments.

bq. FileSystem subclasses such as DistributedFileSystem only have to override 
the new getFileChecksum(..) method.

In the new patch I only let FileSystem.getFileChecksum(Path) call the new 
FileSystem.getFileChecksum(..) method with length = Long.MAX_VALUE. I have not 
changed DistributedFileSystem since the path resolving there may involve other 
FileSystems. I plan to update them in a separate jira where we will also add 
the new getFileChecksum API to other FS.

> Support incremental data copy in DistCp
> ---------------------------------------
>
>                 Key: HADOOP-10608
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10608
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HADOOP-10608.000.patch, HADOOP-10608.001.patch
>
>
> Currently when doing distcp with -update option, for two files with the same 
> file names but with different file length or checksum, we overwrite the whole 
> file. It will be good if we can detect the case where (sourceFile = 
> targetFile + appended_data), and only transfer the appended data segment to 
> the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to