[
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102309#comment-17102309
]
Jinglun commented on HDFS-15294:
--------------------------------
Hi [~ayushtkn], thanks your nice comments.
{quote}One doubt that I am having is how would EC files would be handled,
distCp I don't think support preserve EC here at trunk?
And there would couple of other parameters too like storage policy xAttrs and
stuff that I guess won't get carry forwarded?
{quote}
The hadoop-federation-balance tool relies on the distcp. Currently the distcp
doesn't support preserving EC and storage type. We can start a new sub-task,
working on distcp to support them. Then we can let our
hadoop-federation-balance tool to support EC and storagetype.
{quote}While updating the mount point post distCp is success, Is there a chance
of a Race condition? Data changing in between. We can turn the mount entry read
only temporarily, if not done already.
{quote}
Currently the read only is done by cancelling the x permission of the source
path. Using the mount entry is a better choice. As I has split it into 2
patches, the mount entry code is in the second patch. I'll update it and upload
after the first patch is approved.
Details for MountTableProcedure and TrashProcedure: I'll upload a new pdf
explaining them.
We can review the patch at HDFS-15340 first. The first patch only contains a
scheduler model. v07 can be taken as a reference of how the
hadoop-federation-balance module would use the scheduler.
> RBF: Balance data across federation namespaces with DistCp and snapshot diff
> ----------------------------------------------------------------------------
>
> Key: HDFS-15294
> URL: https://issues.apache.org/jira/browse/HDFS-15294
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Jinglun
> Assignee: Jinglun
> Priority: Major
> Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch,
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch,
> HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch,
> HDFS-15294.007.patch, distcp-balance.pdf
>
>
> This jira introduces a new balance command 'fedbalance' that is ran by the
> administrator. The process is:
> 1. Use distcp and snapshot diff to sync data between src and dst until they
> are the same.
> 2. Update mount table in Router.
> 3. Delete the src to trash.
>
> The patch is too big to review, so I split it into 2 patches:
> Phase 1 / The State Machine(BalanceProcedureScheduler): Including the
> abstraction of job and scheduler model. <See HDFS-15340>
> {code:java}
> org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
> org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
> org.apache.hadoop.hdfs.procedure.BalanceProcedure;
> org.apache.hadoop.hdfs.procedure.BalanceJob;
> org.apache.hadoop.hdfs.procedure.BalanceJournal;
> org.apache.hadoop.hdfs.procedure.HDFSJournal;
> {code}
> Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob.
> <Pending...>
> {code:java}
> org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
> org.apache.hadoop.tools.DistCpFedBalance;
> org.apache.hadoop.tools.DistCpProcedure;
> org.apache.hadoop.tools.FedBalance;
> org.apache.hadoop.tools.FedBalanceConfigs;
> org.apache.hadoop.tools.FedBalanceContext;
> org.apache.hadoop.tools.TrashProcedure;
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]