[
https://issues.apache.org/jira/browse/HADOOP-10608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997212#comment-13997212
]
Jing Zhao commented on HADOOP-10608:
------------------------------------
If both the source FS and the target FS are HDFS, I think what we can do here
is:
# Check the length of the two files with the same name.
# If the source file's length is greater than the target file's length, we
compare the checksum of their common length part.
# If the checksum matches we only copy their difference using position read and
append functionalities.
> Support appending data in DistCp
> --------------------------------
>
> Key: HADOOP-10608
> URL: https://issues.apache.org/jira/browse/HADOOP-10608
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Jing Zhao
> Assignee: Jing Zhao
>
> Currently when doing distcp with -update option, for two files with the same
> file names but with different file length or checksum, we overwrite the whole
> file. It will be good if we can detect the case where (sourceFile =
> targetFile + appended_data), and only transfer the appended data segment to
> the target. This will be very useful if we're doing incremental distcp.
--
This message was sent by Atlassian JIRA
(v6.2#6252)