[ 
https://issues.apache.org/jira/browse/HADOOP-9700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706241#comment-13706241
 ] 

Jing Zhao commented on HADOOP-9700:
-----------------------------------

Thanks for working on this Binglin! Some questions and thoughts:

1. The smallest unit of distcp is file? In that case we still need to transfer 
the whole file for the append/flush case.
2. Before we use snapshot diff report to capture the difference between a 
cluster and its backup cluster, maybe we can first support a more generic 
scenario using snapshot? I.e., before doing distcp, we can take a snapshot on 
corresponding files/directories in the source cluster, and provide snapshot 
paths as source file list. Because our snapshot is read-only, this can allow 
distcp not affected by rename, append, and other modification operations while 
distcp is ongoing.
                
> Snapshot support for distcp
> ---------------------------
>
>                 Key: HADOOP-9700
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9700
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>         Attachments: HADOOP-9700-demo.patch
>
>
> Add snapshot incremental copy ability to distcp, so we can do iterative 
> consistent backup between hadoop clusters. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to