Yongjun Zhang updated HDFS-10314:
    Status: Patch Available  (was: Open)

HI [~jingzhao],

Sorry I had quite some distraction, finally got a patch for this jira. There is 
probably some more work to polish the fix, then documentation since it would be 
a new tool. However, I hope we can get consensus on the approach here before 
spending more time.

Highlight of the main changes:
* Incorporated the HDFS-9820 changes I made earlier. And make sure that no new 
command line switch is added in DistCp for this feature,  from user's point of 
* This patch addressed the issue reported in HDFS-10263 by massaging the 
reverted snapshot diff.
* Instead of using script to wrap the distcp code, I implemented a new class 
called DistSync in java, that derives from DistCp, and it's in parallel with 
DistCp, from user's point of view.
* DistSync requires one of -diff and -rdiff command line switch, and it support 
copying data from a mirror source cluster in addition to copying from the 
snapshot of the same target cluster;
* I dropped the fallback code which says "if the work on -diff failed, go back 
to normal distcp". I think we would just let the distcp fail in that case, and 
let user to re-issue a distcp command without using -diff. 

Would really appreciate if you could give a preliminary review.

Thanks much.

> Propose a new tool that wraps around distcp to "restore" changes on target 
> cluster
> ----------------------------------------------------------------------------------
>                 Key: HDFS-10314
>                 URL: https://issues.apache.org/jira/browse/HDFS-10314
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-10314.001.patch
> HDFS-9820 proposed adding -rdiff switch to distcp, as a reversed operation of 
> -diff switch. 
> Upon discussion with [~jingzhao], we will introduce a new tool that wraps 
> around distcp to achieve the same purpose.
> I'm thinking about calling the new tool "rsync", similar to unix/linux 
> command "rsync". The "r" here means remote.
> The syntax that simulate -rdiff behavior proposed in HDFS-9820 is
> {code}
> rsync <fromSnapshotName>  <toSnapshotName>  <source> <target>
> {code}
> This command ensure <fromSnapshotName>  is newer than <toSnapshotName>.
> I think, In the future, we can add another command to have the functionality 
> of -diff switch of distcp.
> {code}
> sync <fromSnapshotName>  <toSnapshotName>  <source> <target>
> {code}
> that ensures <fromSnapshotName>  is older than <toSnapshotName>.
> Thanks [~jingzhao].

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to