goiri commented on a change in pull request #2035:
URL: https://github.com/apache/hadoop/pull/2035#discussion_r431392232



##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial

Review comment:
       submits

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target

Review comment:
       link to distcp

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial
+  distcp. Then it uses distcp diff to do the incremental copy. Finally when the
+  source and the target are the same, it updates the mount table in Router and
+  move the source to trash.
+
+  This document aims to describe the design and usage of the Federation 
Balance.
+
+Usage
+-----
+
+### Basic Usage
+
+  The federation balance tool supports both normal federation cluster and
+  router-based federation cluster. Taking rbf for example. Supposing we have a
+  mount entry in Router:
+
+    /foo/src --> hdfs://nn0:8020/foo/src
+
+  Submit a federation balance job locally. The first parameter should be a 
mount
+  entry. The second parameter is the target path. The target path must includes
+  the target cluster.
+
+    bash$ /bin/hadoop fedbalance submit /foo/src hdfs://nn1:8020/foo/dst
+
+  This will copy data from hdfs://nn0:8020/foo/src to hdfs://nn1:8020/foo/dst
+  incrementally and finally update the mount entry to:
+
+    /foo/src --> hdfs://nn1:8020/foo/dst
+
+  If the hadoop shell process exits unexpectedly and we want to continue the
+  unfinished job, we can use command:
+
+    bash$ /bin/hadoop fedbalance continue

Review comment:
       Should this have code format like 73 (I think there might even be a 
better way).

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial
+  distcp. Then it uses distcp diff to do the incremental copy. Finally when the
+  source and the target are the same, it updates the mount table in Router and
+  move the source to trash.
+
+  This document aims to describe the design and usage of the Federation 
Balance.
+
+Usage
+-----
+
+### Basic Usage
+
+  The federation balance tool supports both normal federation cluster and
+  router-based federation cluster. Taking rbf for example. Supposing we have a
+  mount entry in Router:
+
+    /foo/src --> hdfs://nn0:8020/foo/src
+
+  Submit a federation balance job locally. The first parameter should be a 
mount
+  entry. The second parameter is the target path. The target path must includes
+  the target cluster.
+
+    bash$ /bin/hadoop fedbalance submit /foo/src hdfs://nn1:8020/foo/dst
+
+  This will copy data from hdfs://nn0:8020/foo/src to hdfs://nn1:8020/foo/dst
+  incrementally and finally update the mount entry to:
+
+    /foo/src --> hdfs://nn1:8020/foo/dst
+
+  If the hadoop shell process exits unexpectedly and we want to continue the
+  unfinished job, we can use command:
+
+    bash$ /bin/hadoop fedbalance continue
+
+  This will scan the journal to find all the unfinished jobs, recover and
+  continue to execute them.
+  
+  If we want to balance in a normal federation cluster, use the command below.
+  
+    `bash$ /bin/hadoop fedbalance -router false submit hdfs://nn0:8020/foo/src 
hdfs://nn1:8020/foo/dst`
+    
+  The option `-router false` indicates this is not in router-based federation.
+  The source path must includes the source cluster.
+
+Command Options
+--------------------
+Command `submit` has 5 options:
+
+| Option key                     | Description                          |
+| ------------------------------ | ------------------------------------ |
+| -router | If `true` the command runs in router mode. The source path is 
taken as a mount point. It will disable write by setting the mount point 
readonly. Otherwise the command works in normal federation mode. The source 
path is taken as the full path. It will disable write by cancelling the `x` 
permission of the source path. The default value is `true`. |
+| -forceCloseOpen | If `true`, in DIFF_DISTCP stage it will force close all 
open files when there is no diff between the source path and the dst path. 
Otherwise the DIFF_DISTCP stage will wait until there is no open files. The 
default value is `false`. |
+| -map | Max number of concurrent maps to use for copy. |
+| -bandwidth | Specify bandwidth per map in MB. |
+| -moveToTrash | If `true` move the source path to trash after the job is 
done. Otherwise delete the source path directly. |
+
+Configuration Options
+--------------------
+
+| Configuration key              | Description                          |
+| ------------------------------ | ------------------------------------ |
+| hadoop.hdfs.procedure.work.thread.num | The worker threads number of the 
BalanceProcedureScheduler. Default is `10`. |
+| hadoop.hdfs.procedure.scheduler.journal.uri | The uri of the journal. |
+| federation.balance.class | The class used for federation balance. Default is 
`org.apache.hadoop.tools.DistCpProcedure.` |
+
+Architecture of Federation Balance
+----------------------
+
+  The components of the Federation Balance may be classified into the following
+  categories:
+
+  * Balance Procedure Scheduler
+  * DistCpFedBalance
+
+### Balance Procedure Scheduler
+
+  The Balance Procedure Scheduler implements a state machine. It's responsible
+  for scheduling a balance job, including submit, run, delay and recover.
+  The model is showed below:
+
+  ![Balance Procedure Scheduler](images/BalanceProcedureScheduler.png)
+
+  * After a job is submitted, the job is added to the pendingQueue.
+  * Worker thread takes job and run it. Journals are written to storage.
+  * If writing journal fails, the job is added to the recoverQueue for later

Review comment:
       the journal

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial
+  distcp. Then it uses distcp diff to do the incremental copy. Finally when the
+  source and the target are the same, it updates the mount table in Router and
+  move the source to trash.
+
+  This document aims to describe the design and usage of the Federation 
Balance.
+
+Usage
+-----
+
+### Basic Usage
+
+  The federation balance tool supports both normal federation cluster and
+  router-based federation cluster. Taking rbf for example. Supposing we have a
+  mount entry in Router:
+
+    /foo/src --> hdfs://nn0:8020/foo/src
+
+  Submit a federation balance job locally. The first parameter should be a 
mount
+  entry. The second parameter is the target path. The target path must includes
+  the target cluster.
+
+    bash$ /bin/hadoop fedbalance submit /foo/src hdfs://nn1:8020/foo/dst
+
+  This will copy data from hdfs://nn0:8020/foo/src to hdfs://nn1:8020/foo/dst
+  incrementally and finally update the mount entry to:
+
+    /foo/src --> hdfs://nn1:8020/foo/dst
+
+  If the hadoop shell process exits unexpectedly and we want to continue the
+  unfinished job, we can use command:
+
+    bash$ /bin/hadoop fedbalance continue
+
+  This will scan the journal to find all the unfinished jobs, recover and
+  continue to execute them.
+  
+  If we want to balance in a normal federation cluster, use the command below.
+  
+    `bash$ /bin/hadoop fedbalance -router false submit hdfs://nn0:8020/foo/src 
hdfs://nn1:8020/foo/dst`
+    
+  The option `-router false` indicates this is not in router-based federation.
+  The source path must includes the source cluster.
+
+Command Options
+--------------------
+Command `submit` has 5 options:
+
+| Option key                     | Description                          |
+| ------------------------------ | ------------------------------------ |
+| -router | If `true` the command runs in router mode. The source path is 
taken as a mount point. It will disable write by setting the mount point 
readonly. Otherwise the command works in normal federation mode. The source 
path is taken as the full path. It will disable write by cancelling the `x` 
permission of the source path. The default value is `true`. |
+| -forceCloseOpen | If `true`, in DIFF_DISTCP stage it will force close all 
open files when there is no diff between the source path and the dst path. 
Otherwise the DIFF_DISTCP stage will wait until there is no open files. The 
default value is `false`. |
+| -map | Max number of concurrent maps to use for copy. |
+| -bandwidth | Specify bandwidth per map in MB. |
+| -moveToTrash | If `true` move the source path to trash after the job is 
done. Otherwise delete the source path directly. |
+
+Configuration Options
+--------------------
+
+| Configuration key              | Description                          |
+| ------------------------------ | ------------------------------------ |
+| hadoop.hdfs.procedure.work.thread.num | The worker threads number of the 
BalanceProcedureScheduler. Default is `10`. |
+| hadoop.hdfs.procedure.scheduler.journal.uri | The uri of the journal. |
+| federation.balance.class | The class used for federation balance. Default is 
`org.apache.hadoop.tools.DistCpProcedure.` |
+
+Architecture of Federation Balance
+----------------------
+
+  The components of the Federation Balance may be classified into the following
+  categories:
+
+  * Balance Procedure Scheduler
+  * DistCpFedBalance
+
+### Balance Procedure Scheduler
+
+  The Balance Procedure Scheduler implements a state machine. It's responsible
+  for scheduling a balance job, including submit, run, delay and recover.
+  The model is showed below:
+
+  ![Balance Procedure Scheduler](images/BalanceProcedureScheduler.png)
+
+  * After a job is submitted, the job is added to the pendingQueue.
+  * Worker thread takes job and run it. Journals are written to storage.
+  * If writing journal fails, the job is added to the recoverQueue for later
+    recovery. If Worker thread catches a RetryTaskException, it adds the job to
+    the delayQueue.
+  * Rooster thread takes job from delayQueue and adds it back to pendingQueue.
+  * When a scheduler starts, it will scan all the unfinished jobs from
+    journal and add them to the recoverQueue. The recover thread will recover
+    them from journal and add them back to pendingQueue.

Review comment:
       the journal

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial
+  distcp. Then it uses distcp diff to do the incremental copy. Finally when the
+  source and the target are the same, it updates the mount table in Router and
+  move the source to trash.
+
+  This document aims to describe the design and usage of the Federation 
Balance.
+
+Usage
+-----
+
+### Basic Usage
+
+  The federation balance tool supports both normal federation cluster and
+  router-based federation cluster. Taking rbf for example. Supposing we have a
+  mount entry in Router:
+
+    /foo/src --> hdfs://nn0:8020/foo/src
+
+  Submit a federation balance job locally. The first parameter should be a 
mount
+  entry. The second parameter is the target path. The target path must includes
+  the target cluster.
+
+    bash$ /bin/hadoop fedbalance submit /foo/src hdfs://nn1:8020/foo/dst
+
+  This will copy data from hdfs://nn0:8020/foo/src to hdfs://nn1:8020/foo/dst
+  incrementally and finally update the mount entry to:
+
+    /foo/src --> hdfs://nn1:8020/foo/dst
+
+  If the hadoop shell process exits unexpectedly and we want to continue the
+  unfinished job, we can use command:
+
+    bash$ /bin/hadoop fedbalance continue
+
+  This will scan the journal to find all the unfinished jobs, recover and
+  continue to execute them.
+  
+  If we want to balance in a normal federation cluster, use the command below.
+  
+    `bash$ /bin/hadoop fedbalance -router false submit hdfs://nn0:8020/foo/src 
hdfs://nn1:8020/foo/dst`
+    
+  The option `-router false` indicates this is not in router-based federation.
+  The source path must includes the source cluster.
+
+Command Options
+--------------------
+Command `submit` has 5 options:
+
+| Option key                     | Description                          |
+| ------------------------------ | ------------------------------------ |
+| -router | If `true` the command runs in router mode. The source path is 
taken as a mount point. It will disable write by setting the mount point 
readonly. Otherwise the command works in normal federation mode. The source 
path is taken as the full path. It will disable write by cancelling the `x` 
permission of the source path. The default value is `true`. |
+| -forceCloseOpen | If `true`, in DIFF_DISTCP stage it will force close all 
open files when there is no diff between the source path and the dst path. 
Otherwise the DIFF_DISTCP stage will wait until there is no open files. The 
default value is `false`. |
+| -map | Max number of concurrent maps to use for copy. |
+| -bandwidth | Specify bandwidth per map in MB. |
+| -moveToTrash | If `true` move the source path to trash after the job is 
done. Otherwise delete the source path directly. |
+
+Configuration Options
+--------------------
+
+| Configuration key              | Description                          |
+| ------------------------------ | ------------------------------------ |
+| hadoop.hdfs.procedure.work.thread.num | The worker threads number of the 
BalanceProcedureScheduler. Default is `10`. |
+| hadoop.hdfs.procedure.scheduler.journal.uri | The uri of the journal. |
+| federation.balance.class | The class used for federation balance. Default is 
`org.apache.hadoop.tools.DistCpProcedure.` |
+
+Architecture of Federation Balance
+----------------------
+
+  The components of the Federation Balance may be classified into the following
+  categories:
+
+  * Balance Procedure Scheduler
+  * DistCpFedBalance
+
+### Balance Procedure Scheduler
+
+  The Balance Procedure Scheduler implements a state machine. It's responsible
+  for scheduling a balance job, including submit, run, delay and recover.
+  The model is showed below:
+
+  ![Balance Procedure Scheduler](images/BalanceProcedureScheduler.png)
+
+  * After a job is submitted, the job is added to the pendingQueue.
+  * Worker thread takes job and run it. Journals are written to storage.
+  * If writing journal fails, the job is added to the recoverQueue for later
+    recovery. If Worker thread catches a RetryTaskException, it adds the job to
+    the delayQueue.
+  * Rooster thread takes job from delayQueue and adds it back to pendingQueue.
+  * When a scheduler starts, it will scan all the unfinished jobs from
+    journal and add them to the recoverQueue. The recover thread will recover
+    them from journal and add them back to pendingQueue.
+
+### DistCpFedBalance
+
+  DistCpFedBalance is implemented as a job of the state machine. All the distcp
+  balance logic are implemented here. A DistCpFedBalance job consists of 3
+  procedures:
+
+  * DistCpProcedure: This is the first procedure. It handles all the data copy
+    works. There are 6 stages:
+    * PRE_CHECK: Do the pre-check of the src and dst path.
+    * Init Distcp: Create a snapshot of the source path and distcp it to the

Review comment:
       Should this be two stages? One for snapshot and the other for distcp?

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial
+  distcp. Then it uses distcp diff to do the incremental copy. Finally when the
+  source and the target are the same, it updates the mount table in Router and
+  move the source to trash.
+
+  This document aims to describe the design and usage of the Federation 
Balance.
+
+Usage
+-----
+
+### Basic Usage
+
+  The federation balance tool supports both normal federation cluster and
+  router-based federation cluster. Taking rbf for example. Supposing we have a
+  mount entry in Router:
+
+    /foo/src --> hdfs://nn0:8020/foo/src
+
+  Submit a federation balance job locally. The first parameter should be a 
mount
+  entry. The second parameter is the target path. The target path must includes
+  the target cluster.
+
+    bash$ /bin/hadoop fedbalance submit /foo/src hdfs://nn1:8020/foo/dst
+
+  This will copy data from hdfs://nn0:8020/foo/src to hdfs://nn1:8020/foo/dst
+  incrementally and finally update the mount entry to:
+
+    /foo/src --> hdfs://nn1:8020/foo/dst
+
+  If the hadoop shell process exits unexpectedly and we want to continue the
+  unfinished job, we can use command:
+
+    bash$ /bin/hadoop fedbalance continue
+
+  This will scan the journal to find all the unfinished jobs, recover and
+  continue to execute them.
+  
+  If we want to balance in a normal federation cluster, use the command below.
+  
+    `bash$ /bin/hadoop fedbalance -router false submit hdfs://nn0:8020/foo/src 
hdfs://nn1:8020/foo/dst`
+    
+  The option `-router false` indicates this is not in router-based federation.
+  The source path must includes the source cluster.
+
+Command Options
+--------------------
+Command `submit` has 5 options:
+
+| Option key                     | Description                          |
+| ------------------------------ | ------------------------------------ |
+| -router | If `true` the command runs in router mode. The source path is 
taken as a mount point. It will disable write by setting the mount point 
readonly. Otherwise the command works in normal federation mode. The source 
path is taken as the full path. It will disable write by cancelling the `x` 
permission of the source path. The default value is `true`. |
+| -forceCloseOpen | If `true`, in DIFF_DISTCP stage it will force close all 
open files when there is no diff between the source path and the dst path. 
Otherwise the DIFF_DISTCP stage will wait until there is no open files. The 
default value is `false`. |
+| -map | Max number of concurrent maps to use for copy. |
+| -bandwidth | Specify bandwidth per map in MB. |
+| -moveToTrash | If `true` move the source path to trash after the job is 
done. Otherwise delete the source path directly. |
+
+Configuration Options
+--------------------
+
+| Configuration key              | Description                          |
+| ------------------------------ | ------------------------------------ |
+| hadoop.hdfs.procedure.work.thread.num | The worker threads number of the 
BalanceProcedureScheduler. Default is `10`. |
+| hadoop.hdfs.procedure.scheduler.journal.uri | The uri of the journal. |
+| federation.balance.class | The class used for federation balance. Default is 
`org.apache.hadoop.tools.DistCpProcedure.` |
+
+Architecture of Federation Balance
+----------------------
+
+  The components of the Federation Balance may be classified into the following
+  categories:
+
+  * Balance Procedure Scheduler
+  * DistCpFedBalance
+
+### Balance Procedure Scheduler
+
+  The Balance Procedure Scheduler implements a state machine. It's responsible
+  for scheduling a balance job, including submit, run, delay and recover.
+  The model is showed below:
+
+  ![Balance Procedure Scheduler](images/BalanceProcedureScheduler.png)
+
+  * After a job is submitted, the job is added to the pendingQueue.
+  * Worker thread takes job and run it. Journals are written to storage.
+  * If writing journal fails, the job is added to the recoverQueue for later
+    recovery. If Worker thread catches a RetryTaskException, it adds the job to
+    the delayQueue.
+  * Rooster thread takes job from delayQueue and adds it back to pendingQueue.
+  * When a scheduler starts, it will scan all the unfinished jobs from
+    journal and add them to the recoverQueue. The recover thread will recover
+    them from journal and add them back to pendingQueue.
+
+### DistCpFedBalance
+
+  DistCpFedBalance is implemented as a job of the state machine. All the distcp
+  balance logic are implemented here. A DistCpFedBalance job consists of 3
+  procedures:
+
+  * DistCpProcedure: This is the first procedure. It handles all the data copy
+    works. There are 6 stages:
+    * PRE_CHECK: Do the pre-check of the src and dst path.
+    * Init Distcp: Create a snapshot of the source path and distcp it to the
+      target.
+    * Diff Distcp: Submit distcp with `-diff` round by round to sync source and
+      target paths. If `-forceCloseOpen` is set, this stage will finish when
+      there is no diff between src and dst. Otherwise this stage only finishes
+      when there is no diff and no open files.  
+    * DISABLE_WRITE: Disable write operations so the src won't be changed. When
+      working in router mode, it is done by making the mount point readonly.
+      Otherwise then it is done by cancelling the `x` permission of the source
+      path.
+    * Final Distcp(optional): Force close all the open files and submit the
+      final distcp.
+    * FINISH: Cleanup works. If the 'x' permission is cancelled then restoring

Review comment:
       Kind of weird that some stages are capitals and others not.

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)

Review comment:
       Isn't toc already generating with some command?

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial
+  distcp. Then it uses distcp diff to do the incremental copy. Finally when the
+  source and the target are the same, it updates the mount table in Router and
+  move the source to trash.
+
+  This document aims to describe the design and usage of the Federation 
Balance.

Review comment:
       In this doc, usage is first and then design.

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial
+  distcp. Then it uses distcp diff to do the incremental copy. Finally when the
+  source and the target are the same, it updates the mount table in Router and
+  move the source to trash.
+
+  This document aims to describe the design and usage of the Federation 
Balance.
+
+Usage
+-----
+
+### Basic Usage
+
+  The federation balance tool supports both normal federation cluster and
+  router-based federation cluster. Taking rbf for example. Supposing we have a
+  mount entry in Router:
+
+    /foo/src --> hdfs://nn0:8020/foo/src
+
+  Submit a federation balance job locally. The first parameter should be a 
mount
+  entry. The second parameter is the target path. The target path must includes
+  the target cluster.
+
+    bash$ /bin/hadoop fedbalance submit /foo/src hdfs://nn1:8020/foo/dst
+
+  This will copy data from hdfs://nn0:8020/foo/src to hdfs://nn1:8020/foo/dst
+  incrementally and finally update the mount entry to:
+
+    /foo/src --> hdfs://nn1:8020/foo/dst
+
+  If the hadoop shell process exits unexpectedly and we want to continue the
+  unfinished job, we can use command:
+
+    bash$ /bin/hadoop fedbalance continue
+
+  This will scan the journal to find all the unfinished jobs, recover and
+  continue to execute them.
+  
+  If we want to balance in a normal federation cluster, use the command below.
+  
+    `bash$ /bin/hadoop fedbalance -router false submit hdfs://nn0:8020/foo/src 
hdfs://nn1:8020/foo/dst`
+    
+  The option `-router false` indicates this is not in router-based federation.
+  The source path must includes the source cluster.
+
+Command Options

Review comment:
       This should be a subitem of usage right?

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial
+  distcp. Then it uses distcp diff to do the incremental copy. Finally when the
+  source and the target are the same, it updates the mount table in Router and
+  move the source to trash.
+
+  This document aims to describe the design and usage of the Federation 
Balance.
+
+Usage
+-----
+
+### Basic Usage
+
+  The federation balance tool supports both normal federation cluster and
+  router-based federation cluster. Taking rbf for example. Supposing we have a
+  mount entry in Router:
+
+    /foo/src --> hdfs://nn0:8020/foo/src
+
+  Submit a federation balance job locally. The first parameter should be a 
mount
+  entry. The second parameter is the target path. The target path must includes
+  the target cluster.
+
+    bash$ /bin/hadoop fedbalance submit /foo/src hdfs://nn1:8020/foo/dst
+
+  This will copy data from hdfs://nn0:8020/foo/src to hdfs://nn1:8020/foo/dst
+  incrementally and finally update the mount entry to:
+
+    /foo/src --> hdfs://nn1:8020/foo/dst
+
+  If the hadoop shell process exits unexpectedly and we want to continue the
+  unfinished job, we can use command:
+
+    bash$ /bin/hadoop fedbalance continue
+
+  This will scan the journal to find all the unfinished jobs, recover and
+  continue to execute them.
+  
+  If we want to balance in a normal federation cluster, use the command below.
+  
+    `bash$ /bin/hadoop fedbalance -router false submit hdfs://nn0:8020/foo/src 
hdfs://nn1:8020/foo/dst`
+    
+  The option `-router false` indicates this is not in router-based federation.
+  The source path must includes the source cluster.
+
+Command Options
+--------------------
+Command `submit` has 5 options:
+
+| Option key                     | Description                          |
+| ------------------------------ | ------------------------------------ |
+| -router | If `true` the command runs in router mode. The source path is 
taken as a mount point. It will disable write by setting the mount point 
readonly. Otherwise the command works in normal federation mode. The source 
path is taken as the full path. It will disable write by cancelling the `x` 
permission of the source path. The default value is `true`. |
+| -forceCloseOpen | If `true`, in DIFF_DISTCP stage it will force close all 
open files when there is no diff between the source path and the dst path. 
Otherwise the DIFF_DISTCP stage will wait until there is no open files. The 
default value is `false`. |
+| -map | Max number of concurrent maps to use for copy. |
+| -bandwidth | Specify bandwidth per map in MB. |
+| -moveToTrash | If `true` move the source path to trash after the job is 
done. Otherwise delete the source path directly. |
+
+Configuration Options
+--------------------
+
+| Configuration key              | Description                          |
+| ------------------------------ | ------------------------------------ |
+| hadoop.hdfs.procedure.work.thread.num | The worker threads number of the 
BalanceProcedureScheduler. Default is `10`. |
+| hadoop.hdfs.procedure.scheduler.journal.uri | The uri of the journal. |
+| federation.balance.class | The class used for federation balance. Default is 
`org.apache.hadoop.tools.DistCpProcedure.` |
+
+Architecture of Federation Balance
+----------------------
+
+  The components of the Federation Balance may be classified into the following
+  categories:
+
+  * Balance Procedure Scheduler
+  * DistCpFedBalance
+
+### Balance Procedure Scheduler
+
+  The Balance Procedure Scheduler implements a state machine. It's responsible
+  for scheduling a balance job, including submit, run, delay and recover.
+  The model is showed below:
+
+  ![Balance Procedure Scheduler](images/BalanceProcedureScheduler.png)
+
+  * After a job is submitted, the job is added to the pendingQueue.
+  * Worker thread takes job and run it. Journals are written to storage.

Review comment:
       The worker threads

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial
+  distcp. Then it uses distcp diff to do the incremental copy. Finally when the
+  source and the target are the same, it updates the mount table in Router and
+  move the source to trash.
+
+  This document aims to describe the design and usage of the Federation 
Balance.
+
+Usage
+-----
+
+### Basic Usage
+
+  The federation balance tool supports both normal federation cluster and
+  router-based federation cluster. Taking rbf for example. Supposing we have a
+  mount entry in Router:
+
+    /foo/src --> hdfs://nn0:8020/foo/src
+
+  Submit a federation balance job locally. The first parameter should be a 
mount
+  entry. The second parameter is the target path. The target path must includes
+  the target cluster.
+
+    bash$ /bin/hadoop fedbalance submit /foo/src hdfs://nn1:8020/foo/dst
+
+  This will copy data from hdfs://nn0:8020/foo/src to hdfs://nn1:8020/foo/dst
+  incrementally and finally update the mount entry to:
+
+    /foo/src --> hdfs://nn1:8020/foo/dst
+
+  If the hadoop shell process exits unexpectedly and we want to continue the
+  unfinished job, we can use command:
+
+    bash$ /bin/hadoop fedbalance continue
+
+  This will scan the journal to find all the unfinished jobs, recover and
+  continue to execute them.
+  
+  If we want to balance in a normal federation cluster, use the command below.
+  
+    `bash$ /bin/hadoop fedbalance -router false submit hdfs://nn0:8020/foo/src 
hdfs://nn1:8020/foo/dst`
+    
+  The option `-router false` indicates this is not in router-based federation.
+  The source path must includes the source cluster.
+
+Command Options
+--------------------
+Command `submit` has 5 options:
+
+| Option key                     | Description                          |
+| ------------------------------ | ------------------------------------ |
+| -router | If `true` the command runs in router mode. The source path is 
taken as a mount point. It will disable write by setting the mount point 
readonly. Otherwise the command works in normal federation mode. The source 
path is taken as the full path. It will disable write by cancelling the `x` 
permission of the source path. The default value is `true`. |
+| -forceCloseOpen | If `true`, in DIFF_DISTCP stage it will force close all 
open files when there is no diff between the source path and the dst path. 
Otherwise the DIFF_DISTCP stage will wait until there is no open files. The 
default value is `false`. |
+| -map | Max number of concurrent maps to use for copy. |
+| -bandwidth | Specify bandwidth per map in MB. |
+| -moveToTrash | If `true` move the source path to trash after the job is 
done. Otherwise delete the source path directly. |
+
+Configuration Options
+--------------------
+
+| Configuration key              | Description                          |
+| ------------------------------ | ------------------------------------ |
+| hadoop.hdfs.procedure.work.thread.num | The worker threads number of the 
BalanceProcedureScheduler. Default is `10`. |
+| hadoop.hdfs.procedure.scheduler.journal.uri | The uri of the journal. |
+| federation.balance.class | The class used for federation balance. Default is 
`org.apache.hadoop.tools.DistCpProcedure.` |
+
+Architecture of Federation Balance
+----------------------
+
+  The components of the Federation Balance may be classified into the following
+  categories:
+
+  * Balance Procedure Scheduler
+  * DistCpFedBalance
+
+### Balance Procedure Scheduler
+
+  The Balance Procedure Scheduler implements a state machine. It's responsible
+  for scheduling a balance job, including submit, run, delay and recover.
+  The model is showed below:
+
+  ![Balance Procedure Scheduler](images/BalanceProcedureScheduler.png)
+
+  * After a job is submitted, the job is added to the pendingQueue.
+  * Worker thread takes job and run it. Journals are written to storage.
+  * If writing journal fails, the job is added to the recoverQueue for later
+    recovery. If Worker thread catches a RetryTaskException, it adds the job to
+    the delayQueue.
+  * Rooster thread takes job from delayQueue and adds it back to pendingQueue.
+  * When a scheduler starts, it will scan all the unfinished jobs from

Review comment:
       it scans

##########
File path: 
hadoop-tools/hadoop-federation-balance/src/site/markdown/FederationBalance.md
##########
@@ -0,0 +1,156 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Federation Balance Guide
+=====================
+
+---
+
+ - [Overview](#Overview)
+ - [Usage](#Usage)
+     - [Basic Usage](#Basic_Usage)
+ - [Command Options](#Command_Options)
+ - [Configuration Options](#Configuration_Options)
+ - [Architecture of Federation Balance](#Architecture_of_Federation_Balance)
+
+---
+
+Overview
+--------
+
+  Federation Balance is a tool balancing data across different federation
+  namespaces. It uses DistCp to copy data from the source path to the target
+  path. First it creates a snapshot at the source path and submit the initial
+  distcp. Then it uses distcp diff to do the incremental copy. Finally when the
+  source and the target are the same, it updates the mount table in Router and
+  move the source to trash.
+
+  This document aims to describe the design and usage of the Federation 
Balance.
+
+Usage
+-----
+
+### Basic Usage
+
+  The federation balance tool supports both normal federation cluster and
+  router-based federation cluster. Taking rbf for example. Supposing we have a
+  mount entry in Router:
+
+    /foo/src --> hdfs://nn0:8020/foo/src
+
+  Submit a federation balance job locally. The first parameter should be a 
mount
+  entry. The second parameter is the target path. The target path must includes
+  the target cluster.
+
+    bash$ /bin/hadoop fedbalance submit /foo/src hdfs://nn1:8020/foo/dst
+
+  This will copy data from hdfs://nn0:8020/foo/src to hdfs://nn1:8020/foo/dst
+  incrementally and finally update the mount entry to:
+
+    /foo/src --> hdfs://nn1:8020/foo/dst
+
+  If the hadoop shell process exits unexpectedly and we want to continue the
+  unfinished job, we can use command:
+
+    bash$ /bin/hadoop fedbalance continue
+
+  This will scan the journal to find all the unfinished jobs, recover and
+  continue to execute them.
+  
+  If we want to balance in a normal federation cluster, use the command below.
+  
+    `bash$ /bin/hadoop fedbalance -router false submit hdfs://nn0:8020/foo/src 
hdfs://nn1:8020/foo/dst`
+    
+  The option `-router false` indicates this is not in router-based federation.
+  The source path must includes the source cluster.
+
+Command Options
+--------------------
+Command `submit` has 5 options:
+
+| Option key                     | Description                          |
+| ------------------------------ | ------------------------------------ |
+| -router | If `true` the command runs in router mode. The source path is 
taken as a mount point. It will disable write by setting the mount point 
readonly. Otherwise the command works in normal federation mode. The source 
path is taken as the full path. It will disable write by cancelling the `x` 
permission of the source path. The default value is `true`. |
+| -forceCloseOpen | If `true`, in DIFF_DISTCP stage it will force close all 
open files when there is no diff between the source path and the dst path. 
Otherwise the DIFF_DISTCP stage will wait until there is no open files. The 
default value is `false`. |
+| -map | Max number of concurrent maps to use for copy. |
+| -bandwidth | Specify bandwidth per map in MB. |
+| -moveToTrash | If `true` move the source path to trash after the job is 
done. Otherwise delete the source path directly. |
+
+Configuration Options
+--------------------
+
+| Configuration key              | Description                          |
+| ------------------------------ | ------------------------------------ |
+| hadoop.hdfs.procedure.work.thread.num | The worker threads number of the 
BalanceProcedureScheduler. Default is `10`. |
+| hadoop.hdfs.procedure.scheduler.journal.uri | The uri of the journal. |
+| federation.balance.class | The class used for federation balance. Default is 
`org.apache.hadoop.tools.DistCpProcedure.` |
+
+Architecture of Federation Balance
+----------------------
+
+  The components of the Federation Balance may be classified into the following
+  categories:
+
+  * Balance Procedure Scheduler
+  * DistCpFedBalance
+
+### Balance Procedure Scheduler
+
+  The Balance Procedure Scheduler implements a state machine. It's responsible
+  for scheduling a balance job, including submit, run, delay and recover.
+  The model is showed below:
+
+  ![Balance Procedure Scheduler](images/BalanceProcedureScheduler.png)
+
+  * After a job is submitted, the job is added to the pendingQueue.
+  * Worker thread takes job and run it. Journals are written to storage.
+  * If writing journal fails, the job is added to the recoverQueue for later
+    recovery. If Worker thread catches a RetryTaskException, it adds the job to
+    the delayQueue.
+  * Rooster thread takes job from delayQueue and adds it back to pendingQueue.
+  * When a scheduler starts, it will scan all the unfinished jobs from
+    journal and add them to the recoverQueue. The recover thread will recover
+    them from journal and add them back to pendingQueue.
+
+### DistCpFedBalance
+
+  DistCpFedBalance is implemented as a job of the state machine. All the distcp
+  balance logic are implemented here. A DistCpFedBalance job consists of 3
+  procedures:
+
+  * DistCpProcedure: This is the first procedure. It handles all the data copy
+    works. There are 6 stages:
+    * PRE_CHECK: Do the pre-check of the src and dst path.
+    * Init Distcp: Create a snapshot of the source path and distcp it to the
+      target.
+    * Diff Distcp: Submit distcp with `-diff` round by round to sync source and
+      target paths. If `-forceCloseOpen` is set, this stage will finish when
+      there is no diff between src and dst. Otherwise this stage only finishes
+      when there is no diff and no open files.  
+    * DISABLE_WRITE: Disable write operations so the src won't be changed. When
+      working in router mode, it is done by making the mount point readonly.
+      Otherwise then it is done by cancelling the `x` permission of the source

Review comment:
       x permission?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to