[
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yiqun Lin updated HDFS-15294:
-----------------------------
Description:
This jira introduces a new HDFS federation balance tool to balance data across
different federation namespaces. It uses Distcp to copy data from the source
path to the target path.
The process is:
1. Use distcp and snapshot diff to sync data between src and dst until they
are the same.
2. Update mount table in Router if we specified RBF mode.
3. Deal with src data, move to trash, delete or skip them.
The design of fedbalance tool comes from the discussion in HDFS-15087.
was:
This jira introduces a new HDFS federation balance tool to balance data across
different federation namespaces. It uses Distcp to copy data from the source
path to the target path.
The process is:
1. Use distcp and snapshot diff to sync data between src and dst until they
are the same.
2. Update mount table in Router if we specified RBF mode.
3. Deal with src data, move to trash, delete or skip them.
This
The patch is too big to review, so I split it into 2 patches:
Phase 1 / The State Machine(BalanceProcedureScheduler): Including the
abstraction of job and scheduler model. <See HDFS-15340>
{code:java}
org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
org.apache.hadoop.hdfs.procedure.BalanceProcedure;
org.apache.hadoop.hdfs.procedure.BalanceJob;
org.apache.hadoop.hdfs.procedure.BalanceJournal;
org.apache.hadoop.hdfs.procedure.HDFSJournal;
{code}
Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob. <See
HDFS-15346>
{code:java}
org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
org.apache.hadoop.tools.DistCpFedBalance;
org.apache.hadoop.tools.DistCpProcedure;
org.apache.hadoop.tools.FedBalance;
org.apache.hadoop.tools.FedBalanceConfigs;
org.apache.hadoop.tools.FedBalanceContext;
org.apache.hadoop.tools.TrashProcedure;
{code}
> Federation balance tool
> -----------------------
>
> Key: HDFS-15294
> URL: https://issues.apache.org/jira/browse/HDFS-15294
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Jinglun
> Assignee: Jinglun
> Priority: Major
> Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch,
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch,
> HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch,
> HDFS-15294.007.patch, distcp-balance.pdf, distcp-balance.v2.pdf
>
>
> This jira introduces a new HDFS federation balance tool to balance data
> across different federation namespaces. It uses Distcp to copy data from the
> source path to the target path.
> The process is:
> 1. Use distcp and snapshot diff to sync data between src and dst until they
> are the same.
> 2. Update mount table in Router if we specified RBF mode.
> 3. Deal with src data, move to trash, delete or skip them.
> The design of fedbalance tool comes from the discussion in HDFS-15087.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]