[
https://issues.apache.org/jira/browse/HADOOP-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706241#comment-13706241
]
Jing Zhao commented on HADOOP-9700:
-----------------------------------
Thanks for working on this Binglin! Some questions and thoughts:
1. The smallest unit of distcp is file? In that case we still need to transfer
the whole file for the append/flush case.
2. Before we use snapshot diff report to capture the difference between a
cluster and its backup cluster, maybe we can first support a more generic
scenario using snapshot? I.e., before doing distcp, we can take a snapshot on
corresponding files/directories in the source cluster, and provide snapshot
paths as source file list. Because our snapshot is read-only, this can allow
distcp not affected by rename, append, and other modification operations while
distcp is ongoing.
> Snapshot support for distcp
> ---------------------------
>
> Key: HADOOP-9700
> URL: https://issues.apache.org/jira/browse/HADOOP-9700
> Project: Hadoop Common
> Issue Type: New Feature
> Components: tools/distcp
> Reporter: Binglin Chang
> Assignee: Binglin Chang
> Attachments: HADOOP-9700-demo.patch
>
>
> Add snapshot incremental copy ability to distcp, so we can do iterative
> consistent backup between hadoop clusters.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira