[ 
https://issues.apache.org/jira/browse/HDFS-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103079#comment-17103079
 ] 

Jinglun commented on HDFS-15294:
--------------------------------

Hi [~linyiqun], thanks your nice comments. 
{quote}Current fedbalance tool seems can be only used for the case of RBF, but 
actually this tool can also help balance data in normal federation case. It 
will be better to additionally add an option to allow submit BalancerJob 
without MountTableProcedure.
{quote}
Yes you are right. I'll update the next patch with an option for normal 
federation case.
{quote}Now current implementation completely different with the origin design 
in HDFS-15087 which uses the way like, block writes -> saveTree -> graftTree -> 
hardlink -> update router mount table. Does that mean you change the way that 
you implemented before? But anyway, current fedbalance tool way looks good, it 
doesn't lock the folder and can minimum the user impact. [~LiJinglun], if you 
already have a new design based on fedbalance way, please update the new design 
doc. Thanks.
{quote}
The balance in RBF is a big work. I should split it into many sub-tasks. I 
think staring it with distcp diff is a good choice. I didn't give up the 
HDFS-15087 way. Though it impacts user a lot but is more efficient. But 
starting with it is not a good idea as it is not universal and covers only a 
little scenarios. In the future might be we can use it for fast rename small 
paths across federation sub-clusters.

The distcp-balance.v2.pdf is the design doc for this jira. May be I should use 
a google doc for this. So all the suggestions can be updated in time.

 

 

 Another question is all the suggestions seem to be related to the 
DistCpFedBalance part, which is in the second patch. Could you help to review 
the first patch so I can start working on the second one. Thanks very much :D !

> RBF: Balance data across federation namespaces with DistCp and snapshot diff
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-15294
>                 URL: https://issues.apache.org/jira/browse/HDFS-15294
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>         Attachments: BalanceProcedureScheduler.png, HDFS-15294.001.patch, 
> HDFS-15294.002.patch, HDFS-15294.003.patch, HDFS-15294.003.reupload.patch, 
> HDFS-15294.004.patch, HDFS-15294.005.patch, HDFS-15294.006.patch, 
> HDFS-15294.007.patch, distcp-balance.pdf, distcp-balance.v2.pdf
>
>
> This jira introduces a new balance command 'fedbalance' that is ran by the 
> administrator. The process is:
>  1. Use distcp and snapshot diff to sync data between src and dst until they 
> are the same.
>  2. Update mount table in Router.
>  3. Delete the src to trash.
>  
> The patch is too big to review, so I split it into 2 patches:
> Phase 1 / The State Machine(BalanceProcedureScheduler): Including the 
> abstraction of job and scheduler model.   <See HDFS-15340>
> {code:java}
> org.apache.hadoop.hdfs.procedure.BalanceProcedureScheduler;
> org.apache.hadoop.hdfs.procedure.BalanceProcedureConfigKeys;
> org.apache.hadoop.hdfs.procedure.BalanceProcedure;
> org.apache.hadoop.hdfs.procedure.BalanceJob;
> org.apache.hadoop.hdfs.procedure.BalanceJournal;
> org.apache.hadoop.hdfs.procedure.HDFSJournal;
> {code}
> Phase 2 / The DistCpFedBalance: It's an implementation of BalanceJob.    
> <Pending...>
> {code:java}
> org.apache.hadoop.hdfs.server.federation.procedure.MountTableProcedure;
> org.apache.hadoop.tools.DistCpFedBalance;
> org.apache.hadoop.tools.DistCpProcedure;
> org.apache.hadoop.tools.FedBalance;
> org.apache.hadoop.tools.FedBalanceConfigs;
> org.apache.hadoop.tools.FedBalanceContext;
> org.apache.hadoop.tools.TrashProcedure;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to