[
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yiqun Lin updated HDFS-15294:
-----------------------------
Description:
This jira introduces a new HDFS federation balance tool to balance data across
different federation namespaces. It uses Distcp to copy data from the source
path to the target path.
The process is:
1. Use distcp and snapshot diff to sync data between src and dst until they
are the same.
2. Update mount table in Router if we specified RBF mode.
3. Deal with src data, move to trash, delete or skip them.
This
The patch is too big to review, so I split it into 2 patches:
Phase 1 / The State Machine(BalanceProcedureScheduler): Including the
abstraction of job and scheduler model. <See HDFS-15340>
{code:java}
org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
org.apache.hadoop.hdfs.procedure.BalanceProcedure;
org.apache.hadoop.hdfs.procedure.BalanceJob;
org.apache.hadoop.hdfs.procedure.BalanceJournal;
org.apache.hadoop.hdfs.procedure.HDFSJournal;
{code}
Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob. <See
HDFS-15346>
{code:java}
org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
org.apache.hadoop.tools.DistCpFedBalance;
org.apache.hadoop.tools.DistCpProcedure;
org.apache.hadoop.tools.FedBalance;
org.apache.hadoop.tools.FedBalanceConfigs;
org.apache.hadoop.tools.FedBalanceContext;
org.apache.hadoop.tools.TrashProcedure;
{code}
was:
This jira introduces a new balance command 'fedbalance' that is ran by the
administrator. The process is:
1. Use distcp and snapshot diff to sync data between src and dst until they
are the same.
2. Update mount table in Router.
3. Delete the src to trash.
The patch is too big to review, so I split it into 2 patches:
Phase 1 / The State Machine(BalanceProcedureScheduler): Including the
abstraction of job and scheduler model. <See HDFS-15340>
{code:java}
org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
org.apache.hadoop.hdfs.procedure.BalanceProcedure;
org.apache.hadoop.hdfs.procedure.BalanceJob;
org.apache.hadoop.hdfs.procedure.BalanceJournal;
org.apache.hadoop.hdfs.procedure.HDFSJournal;
{code}
Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob. <See
HDFS-15346>
{code:java}
org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
org.apache.hadoop.tools.DistCpFedBalance;
org.apache.hadoop.tools.DistCpProcedure;
org.apache.hadoop.tools.FedBalance;
org.apache.hadoop.tools.FedBalanceConfigs;
org.apache.hadoop.tools.FedBalanceContext;
org.apache.hadoop.tools.TrashProcedure;
{code}
> Federation balance tool
> -----------------------
>
> Key: HDFS-15294
> URL: https://issues.apache.org/jira/browse/HDFS-15294
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Jinglun
> Assignee: Jinglun
> Priority: Major
> Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch,
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch,
> HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch,
> HDFS-15294.007.patch, distcp-balance.pdf, distcp-balance.v2.pdf
>
>
> This jira introduces a new HDFS federation balance tool to balance data
> across different federation namespaces. It uses Distcp to copy data from the
> source path to the target path.
> The process is:
> 1. Use distcp and snapshot diff to sync data between src and dst until they
> are the same.
> 2. Update mount table in Router if we specified RBF mode.
> 3. Deal with src data, move to trash, delete or skip them.
> This
> The patch is too big to review, so I split it into 2 patches:
> Phase 1 / The State Machine(BalanceProcedureScheduler): Including the
> abstraction of job and scheduler model. <See HDFS-15340>
> {code:java}
> org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
> org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
> org.apache.hadoop.hdfs.procedure.BalanceProcedure;
> org.apache.hadoop.hdfs.procedure.BalanceJob;
> org.apache.hadoop.hdfs.procedure.BalanceJournal;
> org.apache.hadoop.hdfs.procedure.HDFSJournal;
> {code}
> Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob. <See
> HDFS-15346>
> {code:java}
> org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
> org.apache.hadoop.tools.DistCpFedBalance;
> org.apache.hadoop.tools.DistCpProcedure;
> org.apache.hadoop.tools.FedBalance;
> org.apache.hadoop.tools.FedBalanceConfigs;
> org.apache.hadoop.tools.FedBalanceContext;
> org.apache.hadoop.tools.TrashProcedure;
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]