[ 
https://issues.apache.org/jira/browse/HDFS-13123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911849#comment-16911849
 ] 

Wei-Chiu Chuang commented on HDFS-13123:
----------------------------------------

This patch uses distcp + snapshot. [~smeng] FYI.

Given how much "experience" we have associated with distcp + snapshot, I want 
to be very careful with this patch.

You should make sure both directories on the source and destination are 
snapshottable before running this tool.

Probably not a good idea to hard code the snapshot name as "s1" and "s2". Use 
randomly generated name instead. 

I don't understand why you create two snapshots in the source cluster almost 
immediately. If you do so, you only update the files added/deleted during the 
two snapshots.

The distcp -diff command is meant for a read-only destination. The state of 
"s1" snapshot on the source should be exactly the same as the state of "s1" 
snapshot on the destination. You'll hit various strange issues if the 
destination is not a mirror of source. This is either not the right way to use 
the tool, or not the right tool for the use case. 

Additionally, make sure you delete the snapshots even if the prior steps hit 
errors. Otherwise you'll end up with thousands of leftover snapshots.

> RBF: Add a balancer tool to move data across subcluster 
> --------------------------------------------------------
>
>                 Key: HDFS-13123
>                 URL: https://issues.apache.org/jira/browse/HDFS-13123
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Wei Yan
>            Assignee: hemanthboyina
>            Priority: Major
>         Attachments: HDFS Router-Based Federation Rebalancer.pdf, 
> HDFS-13123.patch
>
>
> Follow the discussion in HDFS-12615. This Jira is to track effort for 
> building a rebalancer tool, used by router-based federation to move data 
> among subclusters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to