[ 
https://issues.apache.org/jira/browse/HADOOP-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997212#comment-13997212
 ] 

Jing Zhao commented on HADOOP-10608:
------------------------------------

If both the source FS and the target FS are HDFS, I think what we can do here 
is:
# Check the length of the two files with the same name. 
# If the source file's length is greater than the target file's length, we 
compare the checksum of their common length part. 
# If the checksum matches we only copy their difference using position read and 
append functionalities.

> Support appending data in DistCp
> --------------------------------
>
>                 Key: HADOOP-10608
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10608
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>
> Currently when doing distcp with -update option, for two files with the same 
> file names but with different file length or checksum, we overwrite the whole 
> file. It will be good if we can detect the case where (sourceFile = 
> targetFile + appended_data), and only transfer the appended data segment to 
> the target. This will be very useful if we're doing incremental distcp.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to