[ 
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102309#comment-17102309
 ] 

Jinglun commented on HDFS-15294:
--------------------------------

Hi [~ayushtkn], thanks your nice comments.
{quote}One doubt that I am having is how would EC files would be handled, 
distCp I don't think support preserve EC here at trunk?

And there would couple of other parameters too like storage policy xAttrs and 
stuff that I guess won't get carry forwarded?
{quote}
The hadoop-federation-balance tool relies on the distcp. Currently the distcp 
doesn't support preserving EC and storage type. We can start a new sub-task, 
working on distcp to support them. Then we can let our 
hadoop-federation-balance tool to support EC and storagetype.
{quote}While updating the mount point post distCp is success, Is there a chance 
of a Race condition? Data changing in between. We can turn the mount entry read 
only temporarily, if not done already.
{quote}
Currently the read only is done by cancelling the x permission of the source 
path. Using the mount entry is a better choice. As I has split it into 2 
patches, the mount entry code is in the second patch. I'll update it and upload 
after the first patch is approved.

 

Details for MountTableProcedure and TrashProcedure: I'll upload a new pdf 
explaining them.

 

We can review the patch at HDFS-15340 first. The first patch only contains a 
scheduler model. v07 can be taken as a reference of how the 
hadoop-federation-balance module would use the scheduler.

> RBF: Balance data across federation namespaces with DistCp and snapshot diff
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-15294
>                 URL: https://issues.apache.org/jira/browse/HDFS-15294
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>         Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch, 
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch, 
> HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch, 
> HDFS-15294.007.patch, distcp-balance.pdf
>
>
> This jira introduces a new balance command 'fedbalance' that is ran by the 
> administrator. The process is:
>  1. Use distcp and snapshot diff to sync data between src and dst until they 
> are the same.
>  2. Update mount table in Router.
>  3. Delete the src to trash.
>  
> The patch is too big to review, so I split it into 2 patches:
> Phase 1 / The State Machine(BalanceProcedureScheduler): Including the 
> abstraction of job and scheduler model.   <See HDFS-15340>
> {code:java}
> org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
> org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
> org.apache.hadoop.hdfs.procedure.BalanceProcedure;
> org.apache.hadoop.hdfs.procedure.BalanceJob;
> org.apache.hadoop.hdfs.procedure.BalanceJournal;
> org.apache.hadoop.hdfs.procedure.HDFSJournal;
> {code}
> Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob.    
> <Pending...>
> {code:java}
> org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
> org.apache.hadoop.tools.DistCpFedBalance;
> org.apache.hadoop.tools.DistCpProcedure;
> org.apache.hadoop.tools.FedBalance;
> org.apache.hadoop.tools.FedBalanceConfigs;
> org.apache.hadoop.tools.FedBalanceContext;
> org.apache.hadoop.tools.TrashProcedure;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to