YARN-8394. Improve data locality documentation for Capacity Scheduler. 
Contributed by Weiwei Yang.


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/29024a62
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/29024a62
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/29024a62

Branch: refs/heads/HDDS-48
Commit: 29024a62038c297f11e8992601f2522ffffc7da7
Parents: 108da85
Author: Weiwei Yang <w...@apache.org>
Authored: Wed Jun 13 09:28:05 2018 +0800
Committer: Weiwei Yang <w...@apache.org>
Committed: Wed Jun 13 09:28:05 2018 +0800

----------------------------------------------------------------------
 .../conf/capacity-scheduler.xml                                 | 2 ++
 .../hadoop-yarn-site/src/site/markdown/CapacityScheduler.md     | 5 +++++
 2 files changed, 7 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/29024a62/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml
index aca6c7c..62654ca 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml
@@ -149,6 +149,8 @@
       attempts to schedule rack-local containers.
       When setting this parameter, the size of the cluster should be taken 
into account.
       We use 40 as the default value, which is approximately the number of 
nodes in one rack.
+      Note, if this value is -1, the locality constraint in the container 
request
+      will be ignored, which disables the delay scheduling.
     </description>
   </property>
 

http://git-wip-us.apache.org/repos/asf/hadoop/blob/29024a62/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
index ef6381a..5be32d4 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
@@ -400,9 +400,14 @@ list of current scheduling edit policies as a comma 
separated string in `yarn.re
 
   * Data Locality
 
+Capacity Scheduler leverages `Delay Scheduling` to honor task locality 
constraints. There are 3 levels of locality constraint: node-local, rack-local 
and off-switch. The scheduler counts the number of missed opportunities when 
the locality cannot be satisfied, and waits this count to reach a threshold 
before relaxing the locality constraint to next level. The threshold can be 
configured in following properties:
+
 | Property | Description |
 |:---- |:---- |
 | `yarn.scheduler.capacity.node-locality-delay` | Number of missed scheduling 
opportunities after which the CapacityScheduler attempts to schedule rack-local 
containers. Typically, this should be set to number of nodes in the cluster. By 
default is setting approximately number of nodes in one rack which is 40. 
Positive integer value is expected. |
+| `yarn.scheduler.capacity.rack-locality-additional-delay` |  Number of 
additional missed scheduling opportunities over the node-locality-delay ones, 
after which the CapacityScheduler attempts to schedule off-switch containers. 
By default this value is set to -1, in this case, the number of missed 
opportunities for assigning off-switch containers is calculated based on the 
formula `L * C / N`, where `L` is number of locations (nodes or racks) 
specified in the resource request, `C` is the number of requested containers, 
and `N` is the size of the cluster. |
+
+Note, this feature should be disabled if YARN is deployed separately with the 
file system, as locality is meaningless. This can be done by setting 
`yarn.scheduler.capacity.node-locality-delay` to `-1`, in this case, request's 
locality constraint is ignored.
 
   * Container Allocation per NodeManager Heartbeat
 


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-commits-h...@hadoop.apache.org

Reply via email to