wojiaodoubao commented on a change in pull request #2035:
URL: https://github.com/apache/hadoop/pull/2035#discussion_r432282229



##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial
+  distcp. Then it uses distcp diff to do the incremental copy. Finally when the
+  source and the target are the same, it updates the mount table in Router and
+  move the source to trash.
+
+  This document aims to describe the design and usage of the Federation 
Balance.
+
+Usage
+-----
+
+### Basic Usage
+
+  The federation balance tool supports both normal federation cluster and
+  router-based federation cluster. Taking rbf for example. Supposing we have a
+  mount entry in Router:
+
+    /foo/src --> hdfs://nn0:8020/foo/src
+
+  Submit a federation balance job locally. The first parameter should be a 
mount
+  entry. The second parameter is the target path. The target path must includes
+  the target cluster.
+
+    bash$ /bin/hadoop fedbalance submit /foo/src hdfs://nn1:8020/foo/dst
+
+  This will copy data from hdfs://nn0:8020/foo/src to hdfs://nn1:8020/foo/dst
+  incrementally and finally update the mount entry to:
+
+    /foo/src --> hdfs://nn1:8020/foo/dst
+
+  If the hadoop shell process exits unexpectedly and we want to continue the
+  unfinished job, we can use command:
+
+    bash$ /bin/hadoop fedbalance continue
+
+  This will scan the journal to find all the unfinished jobs, recover and
+  continue to execute them.
+  
+  If we want to balance in a normal federation cluster, use the command below.
+  
+    `bash$ /bin/hadoop fedbalance -router false submit hdfs://nn0:8020/foo/src 
hdfs://nn1:8020/foo/dst`
+    
+  The option `-router false` indicates this is not in router-based federation.
+  The source path must includes the source cluster.
+
+Command Options
+--------------------
+Command `submit` has 5 options:
+
+| Option key                     | Description                          |
+| ------------------------------ | ------------------------------------ |
+| -router | If `true` the command runs in router mode. The source path is 
taken as a mount point. It will disable write by setting the mount point 
readonly. Otherwise the command works in normal federation mode. The source 
path is taken as the full path. It will disable write by cancelling the `x` 
permission of the source path. The default value is `true`. |
+| -forceCloseOpen | If `true`, in DIFF_DISTCP stage it will force close all 
open files when there is no diff between the source path and the dst path. 
Otherwise the DIFF_DISTCP stage will wait until there is no open files. The 
default value is `false`. |
+| -map | Max number of concurrent maps to use for copy. |
+| -bandwidth | Specify bandwidth per map in MB. |
+| -moveToTrash | If `true` move the source path to trash after the job is 
done. Otherwise delete the source path directly. |
+
+Configuration Options
+--------------------
+
+| Configuration key              | Description                          |
+| ------------------------------ | ------------------------------------ |
+| hadoop.hdfs.procedure.work.thread.num | The worker threads number of the 
BalanceProcedureScheduler. Default is `10`. |
+| hadoop.hdfs.procedure.scheduler.journal.uri | The uri of the journal. |
+| federation.balance.class | The class used for federation balance. Default is 
`org.apache.hadoop.tools.DistCpProcedure.` |
+
+Architecture of Federation Balance
+----------------------
+
+  The components of the Federation Balance may be classified into the following
+  categories:
+
+  * Balance Procedure Scheduler
+  * DistCpFedBalance
+
+### Balance Procedure Scheduler
+
+  The Balance Procedure Scheduler implements a state machine. It's responsible
+  for scheduling a balance job, including submit, run, delay and recover.
+  The model is showed below:
+
+  ![Balance Procedure Scheduler](images/BalanceProcedureScheduler.png)
+
+  * After a job is submitted, the job is added to the pendingQueue.
+  * Worker thread takes job and run it. Journals are written to storage.
+  * If writing journal fails, the job is added to the recoverQueue for later
+    recovery. If Worker thread catches a RetryTaskException, it adds the job to
+    the delayQueue.
+  * Rooster thread takes job from delayQueue and adds it back to pendingQueue.
+  * When a scheduler starts, it will scan all the unfinished jobs from
+    journal and add them to the recoverQueue. The recover thread will recover
+    them from journal and add them back to pendingQueue.
+
+### DistCpFedBalance
+
+  DistCpFedBalance is implemented as a job of the state machine. All the distcp
+  balance logic are implemented here. A DistCpFedBalance job consists of 3
+  procedures:
+
+  * DistCpProcedure: This is the first procedure. It handles all the data copy
+    works. There are 6 stages:
+    * PRE_CHECK: Do the pre-check of the src and dst path.
+    * Init Distcp: Create a snapshot of the source path and distcp it to the

Review comment:
       In the INIT_DISTCP stage it first creates snapshot then submits distcp. 
In the DIFF_DISTCP stage it also needs to create snapshot and submit distcp. So 
I combine the snapshot and distcp.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to