[
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102677#comment-17102677
]
Yiqun Lin commented on HDFS-15294:
----------------------------------
Hi [~LiJinglun], thanks for the great work on this!
Some comments for current design:
* Current fedbalance tool seems can be only used for the case of RBF, but
actually this tool can also help balance data in normal federation case. It
will be better to additionally add an option to allow submit BalancerJob
without MountTableProcedure.
* Now current implementation completely different with the origin design in
HDFS-15087 which uses the way like, block writes -> saveTree -> graftTree ->
hardlink -> update router mount table. Does that mean you change the way that
you implemented before? But anyway, current fedbalance tool way looks good, it
doesn't lock the folder and can minimum the user impact.
> RBF: Balance data across federation namespaces with DistCp and snapshot diff
> ----------------------------------------------------------------------------
>
> Key: HDFS-15294
> URL: https://issues.apache.org/jira/browse/HDFS-15294
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Jinglun
> Assignee: Jinglun
> Priority: Major
> Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch,
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch,
> HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch,
> HDFS-15294.007.patch, distcp-balance.pdf, distcp-balance.v2.pdf
>
>
> This jira introduces a new balance command 'fedbalance' that is ran by the
> administrator. The process is:
> 1. Use distcp and snapshot diff to sync data between src and dst until they
> are the same.
> 2. Update mount table in Router.
> 3. Delete the src to trash.
>
> The patch is too big to review, so I split it into 2 patches:
> Phase 1 / The State Machine(BalanceProcedureScheduler): Including the
> abstraction of job and scheduler model. <See HDFS-15340>
> {code:java}
> org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
> org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
> org.apache.hadoop.hdfs.procedure.BalanceProcedure;
> org.apache.hadoop.hdfs.procedure.BalanceJob;
> org.apache.hadoop.hdfs.procedure.BalanceJournal;
> org.apache.hadoop.hdfs.procedure.HDFSJournal;
> {code}
> Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob.
> <Pending...>
> {code:java}
> org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
> org.apache.hadoop.tools.DistCpFedBalance;
> org.apache.hadoop.tools.DistCpProcedure;
> org.apache.hadoop.tools.FedBalance;
> org.apache.hadoop.tools.FedBalanceConfigs;
> org.apache.hadoop.tools.FedBalanceContext;
> org.apache.hadoop.tools.TrashProcedure;
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]