[
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103104#comment-17103104
]
Yiqun Lin commented on HDFS-15294:
----------------------------------
Hi [~LiJinglun], I agree that starting with distcp diff copy is a good choice
and be more acceptable.
{quote}I didn't give up the HDFS-15087 way. Though it impacts user a lot but is
more efficient.
{quote}
>From my personal opinion, user impact is more important than efficiency. The
>efficiency of fedbalace job is impacted by multiple diff distcp be scheduled
>not the balance data slowness. So it's not a big problem I think. Any thoughts
>from others?
{quote}Another question is all the suggestions seem to be related to the
DistCpFedBalance part, which is in the second patch. Could you help to review
the first patch so I can start working on the second one.
{quote}
Yes, I can help do this part of review work in recent days.
> RBF: Balance data across federation namespaces with DistCp and snapshot diff
> ----------------------------------------------------------------------------
>
> Key: HDFS-15294
> URL: https://issues.apache.org/jira/browse/HDFS-15294
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Jinglun
> Assignee: Jinglun
> Priority: Major
> Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch,
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch,
> HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch,
> HDFS-15294.007.patch, distcp-balance.pdf, distcp-balance.v2.pdf
>
>
> This jira introduces a new balance command 'fedbalance' that is ran by the
> administrator. The process is:
> 1. Use distcp and snapshot diff to sync data between src and dst until they
> are the same.
> 2. Update mount table in Router.
> 3. Delete the src to trash.
>
> The patch is too big to review, so I split it into 2 patches:
> Phase 1 / The State Machine(BalanceProcedureScheduler): Including the
> abstraction of job and scheduler model. <See HDFS-15340>
> {code:java}
> org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
> org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
> org.apache.hadoop.hdfs.procedure.BalanceProcedure;
> org.apache.hadoop.hdfs.procedure.BalanceJob;
> org.apache.hadoop.hdfs.procedure.BalanceJournal;
> org.apache.hadoop.hdfs.procedure.HDFSJournal;
> {code}
> Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob. <See
> HDFS-15346>
> {code:java}
> org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
> org.apache.hadoop.tools.DistCpFedBalance;
> org.apache.hadoop.tools.DistCpProcedure;
> org.apache.hadoop.tools.FedBalance;
> org.apache.hadoop.tools.FedBalanceConfigs;
> org.apache.hadoop.tools.FedBalanceContext;
> org.apache.hadoop.tools.TrashProcedure;
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]