[
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101721#comment-17101721
]
Ayush Saxena commented on HDFS-15294:
-------------------------------------
Thanx [~LiJinglun] for the patch. I just started reading the code from the
document. Will take some time to reach to end.
* One doubt that I am having is how would EC files would be handled, distCp I
don't think support preserve EC here at trunk?
* And there would couple of other parameters too like storage policy xAttrs and
stuff that I guess won't get carry forwarded?
* While updating the mount point post distCp is success, Is there a chance of a
Race condition? Data changing in between. We can turn the mount entry read only
temporarily, if not done already.
* Can you add some details for this too :
{code:java}
* SingleMountTableProcedure: This procedure updates the mount entry in Router.
+
+ * TrashProcedure: This procedure move the source path to trash.
{code}
> RBF: Balance data across federation namespaces with DistCp and snapshot diff
> ----------------------------------------------------------------------------
>
> Key: HDFS-15294
> URL: https://issues.apache.org/jira/browse/HDFS-15294
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Jinglun
> Assignee: Jinglun
> Priority: Major
> Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch,
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch,
> HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch,
> HDFS-15294.007.patch, distcp-balance.pdf
>
>
> This jira introduces a new balance command 'fedbalance' that is ran by the
> administrator. The process is:
> 1. Use distcp and snapshot diff to sync data between src and dst until they
> are the same.
> 2. Update mount table in Router.
> 3. Delete the src to trash.
>
> The patch is too big to review, so I split it into 2 patches:
> Phase 1 / The State Machine(BalanceProcedureScheduler): Including the
> abstraction of job and scheduler model. <See HDFS-15340>
> {code:java}
> org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
> org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
> org.apache.hadoop.hdfs.procedure.BalanceProcedure;
> org.apache.hadoop.hdfs.procedure.BalanceJob;
> org.apache.hadoop.hdfs.procedure.BalanceJournal;
> org.apache.hadoop.hdfs.procedure.HDFSJournal;
> {code}
> Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob.
> <Pending...>
> {code:java}
> org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
> org.apache.hadoop.tools.DistCpFedBalance;
> org.apache.hadoop.tools.DistCpProcedure;
> org.apache.hadoop.tools.FedBalance;
> org.apache.hadoop.tools.FedBalanceConfigs;
> org.apache.hadoop.tools.FedBalanceContext;
> org.apache.hadoop.tools.TrashProcedure;
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]