Repository: hadoop
Updated Branches:
  refs/heads/branch-2.8 c487453b9 -> aeea77ce1


YARN-4100. Add Documentation for Distributed and Delegated-Centralized
Node Labels feature. Contributed by Naganarasimha G R.

(cherry picked from commit db144eb1c51c1f37bdd1e0c18e9a5b0969c82e33)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/aeea77ce
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/aeea77ce
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/aeea77ce

Branch: refs/heads/branch-2.8
Commit: aeea77ce147c8f53a868274654df693437e1c435
Parents: c487453
Author: Devaraj K <deva...@apache.org>
Authored: Tue Feb 2 12:06:51 2016 +0530
Committer: Devaraj K <deva...@apache.org>
Committed: Tue Feb 2 12:08:56 2016 +0530

----------------------------------------------------------------------
 hadoop-yarn-project/CHANGES.txt                 |  3 +
 .../src/main/resources/yarn-default.xml         | 50 ++++++------
 .../src/site/markdown/NodeLabel.md              | 86 ++++++++++++++++----
 3 files changed, 99 insertions(+), 40 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/aeea77ce/hadoop-yarn-project/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-yarn-project/CHANGES.txt b/hadoop-yarn-project/CHANGES.txt
index 901a1eb..636db91 100644
--- a/hadoop-yarn-project/CHANGES.txt
+++ b/hadoop-yarn-project/CHANGES.txt
@@ -592,6 +592,9 @@ Release 2.8.0 - UNRELEASED
 
     YARN-4340. Add "list" API to reservation system. (Sean Po via wangda)
 
+    YARN-4100. Add Documentation for Distributed and Delegated-Centralized
+    Node Labels feature. (Naganarasimha G R via devaraj)
+
   OPTIMIZATIONS
 
     YARN-3339. TestDockerContainerExecutor should pull a single image and not

http://git-wip-us.apache.org/repos/asf/hadoop/blob/aeea77ce/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
index 0add988..80f0fea 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
@@ -2281,26 +2281,26 @@
   <!-- Distributed Node Labels Configuration -->
   <property>
     <description>
-    When "yarn.node-labels.configuration-type" parameter in RM is configured as
-    "distributed", Administrators can configure in NM, the provider for        
the
+    When "yarn.node-labels.configuration-type" is configured with "distributed"
+    in RM, Administrators can configure in NM the provider for the
     node labels by configuring this parameter. Administrators can
-    specify "config", "script" or the class name of the provider. Configured
+    configure "config", "script" or the class name of the provider. Configured
     class needs to extend
     org.apache.hadoop.yarn.server.nodemanager.nodelabels.NodeLabelsProvider.
-    If "config" is specified then "ConfigurationNodeLabelsProvider" and
-    "script" then "ScriptNodeLabelsProvider" will be used.
+    If "config" is configured, then "ConfigurationNodeLabelsProvider" and if
+    "script" is configured, then "ScriptNodeLabelsProvider" will be used.
     </description>
     <name>yarn.nodemanager.node-labels.provider</name>
   </property>
 
   <property>
     <description>
-    When node labels "yarn.nodemanager.node-labels.provider" is of type
-    "config" or the configured class extends AbstractNodeLabelsProvider then
-    periodically node labels are retrieved from the node labels provider.
-    This configuration is to define the interval. If -1 is configured then
-    node labels are retrieved from. provider only during initialization.
-    Defaults to 10 mins.
+    When "yarn.nodemanager.node-labels.provider" is configured with "config",
+    "Script" or the configured class extends AbstractNodeLabelsProvider, then
+    periodically node labels are retrieved from the node labels provider. This
+    configuration is to define the interval period.
+    If -1 is configured then node labels are retrieved from provider only
+    during initialization. Defaults to 10 mins.
     </description>
     <name>yarn.nodemanager.node-labels.provider.fetch-interval-ms</name>
     <value>600000</value>
@@ -2308,8 +2308,8 @@
 
   <property>
     <description>
-   Interval at which node labels syncs with RM from NM.Will send loaded labels
-   every x intervals configured along with heartbeat from NM to RM.
+   Interval at which NM syncs its node labels with RM. NM will send its loaded
+   labels every x intervals configured, along with heartbeat to RM.
     </description>
     <name>yarn.nodemanager.node-labels.resync-interval-ms</name>
     <value>120000</value>
@@ -2317,19 +2317,18 @@
 
   <property>
     <description>
-    When node labels "yarn.nodemanager.node-labels.provider"
-    is of type "config" then ConfigurationNodeLabelsProvider fetches the
-    partition from this parameter.
+    When "yarn.nodemanager.node-labels.provider" is configured with "config"
+    then ConfigurationNodeLabelsProvider fetches the partition label from this
+    parameter.
     </description>
     
<name>yarn.nodemanager.node-labels.provider.configured-node-partition</name>
   </property>
 
   <property>
     <description>
-    When node labels "yarn.nodemanager.node-labels.provider" is a class
-    which extends AbstractNodeLabelsProvider then this configuration provides
-    the timeout period after which it will stop querying the Node labels
-    provider. Defaults to 20 mins.
+    When "yarn.nodemanager.node-labels.provider" is configured with "Script"
+    then this configuration provides the timeout period after which it will
+    interrupt the script which queries the Node labels. Defaults to 20 mins.
     </description>
     <name>yarn.nodemanager.node-labels.provider.fetch-timeout-ms</name>
     <value>1200000</value>
@@ -2351,8 +2350,8 @@
 
   <property>
     <description>
-    When node labels "yarn.node-labels.configuration-type" is of type
-    "delegated-centralized" then periodically node labels are retrieved
+    When "yarn.node-labels.configuration-type" is configured with
+    "delegated-centralized", then periodically node labels are retrieved
     from the node labels provider. This configuration is to define the
     interval. If -1 is configured then node labels are retrieved from
     provider only once for each node after it registers. Defaults to 30 mins.
@@ -2362,9 +2361,10 @@
   </property>
 
   <property>
-    <description>The Node Label script to run. Script output Lines starting 
with
-     "NODE_PARTITION:" will be considered for Node Labels. In case of multiple
-     lines having the pattern, last one will be considered</description>
+    <description>The Node Label script to run. Script output Line starting with
+     "NODE_PARTITION:" will be considered as Node Label Partition. In case of
+     multiple lines have this pattern, then last one will be considered
+    </description>
     <name>yarn.nodemanager.node-labels.provider.script.path</name>
   </property>
 

http://git-wip-us.apache.org/repos/asf/hadoop/blob/aeea77ce/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeLabel.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeLabel.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeLabel.md
index 87019cd..1fecf07 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeLabel.md
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeLabel.md
@@ -15,7 +15,22 @@
 YARN Node Labels
 ===============
 
-# Overview
+* [Overview](#Overview)
+* [Features](#Features)
+* [Configuration](#Configuration)
+    * [Setting up ResourceManager to enable Node 
Labels](#Setting_up_ResourceManager_to_enable_Node_Labels)
+    * [Add/modify node labels list to 
YARN](#Add/modify_node_labels_list_to_YARN)
+    * [Add/modify node-to-labels mapping to 
YARN](#Add/modify_node-to-labels_mapping_to_YARN)
+    * [Configuration of Schedulers for node 
labels](#Configuration_of_Schedulers_for_node_labels)
+* [Specifying node label for 
application](#Specifying_node_label_for_application)
+* [Monitoring](#Monitoring)
+    * [Monitoring through web UI](#Monitoring_through_web_UI)
+    * [Monitoring through commandline](#Monitoring_through_commandline)
+* [Useful links](#Useful_links)
+
+Overview
+--------
+
 Node label is a way to group nodes with similar characteristics and 
applications can specify where to run.
 
 Now we only support node partition, which is:
@@ -28,20 +43,28 @@ Now we only support node partition, which is:
 
 User can specify set of node labels which can be accessed by each queue, one 
application can only use subset of node labels that can be accessed by the 
queue which contains the application.
 
-# Features
+Features
+--------
+
 The ```Node Labels``` supports the following features for now:
 
 * Partition cluster - each node can be assigned one label, so the cluster will 
be divided to several smaller disjoint partitions.
 * ACL of node-labels on queues - user can set accessible node labels on each 
queue so only some nodes can only be accessed by specific queues.
 * Specify percentage of resource of a partition which can be accessed by a 
queue - user can set percentage like: queue A can access 30% of resources on 
nodes with label=hbase. Such percentage setting will be consistent with 
existing resource manager
-* Specify required Node Label in resource request, it will only be allocated 
when node has the same label. If no node label requirement specified, such 
Resource Request will only be allocated on nodes belong to DEFAULT partition.
+* Specify required node label in resource request, it will only be allocated 
when node has the same label. If no node label requirement specified, such 
Resource Request will only be allocated on nodes belong to DEFAULT partition.
 * Operability
     * Node labels and node labels mapping can be recovered across RM restart
     * Update node labels - admin can update labels on nodes and labels on 
queues
       when RM is running
+* Mapping of NM to node labels can be done in three ways, but in all of the 
approaches Partition Label should be one among the valid node labels list 
configured in the RM.
+    * **Centralized :** Node to labels mapping can be done through RM exposed 
CLI, REST or RPC.
+    * **Distributed :** Node to labels mapping will be set by a configured 
Node Labels Provider in NM. We have two different providers in YARN: *Script* 
based provider and *Configuration* based provider. In case of script, NM can be 
configured with a script path and the script can emit the labels of the node. 
In case of config, node Labels can be directly configured in the NM's 
yarn-site.xml. In both of these options dynamic refresh of the label mapping is 
supported.
+    * **Delegated-Centralized :** Node to labels mapping will be set by a 
configured Node Labels Provider in RM. This would be helpful when label mapping 
cannot be provided by each node due to security concerns and to avoid 
interaction through RM Interfaces for each node in a large cluster. Labels will 
be fetched from this interface during NM registration and periodical refresh is 
also supported.
+
+Configuration
+-------------
 
-# Configuration
-## Setting up ```ResourceManager``` to enable ```Node Labels```:
+###Setting up ResourceManager to enable Node Labels
 
 Setup following properties in ```yarn-site.xml```
 
@@ -49,23 +72,50 @@ Property  | Value
 --- | ----
 yarn.node-labels.fs-store.root-dir  | 
hdfs://namenode:port/path/to/store/node-labels/
 yarn.node-labels.enabled | true
+yarn.node-labels.configuration-type | Set configuration type for node labels. 
Administrators can specify “centralized”, “delegated-centralized” or 
“distributed”. Default value is “centralized”.
 
 Notes:
 
 * Make sure ```yarn.node-labels.fs-store.root-dir``` is created and 
```ResourceManager``` has permission to access it. (Typically from “yarn” 
user)
 * If user want to store node label to local file system of RM (instead of 
HDFS), paths like `file:///home/yarn/node-label` can be used
 
-### Add/modify node labels list and node-to-labels mapping to YARN
+###Add/modify node labels list to YARN
+
 * Add cluster node labels list:
     * Executing ```yarn rmadmin -addToClusterNodeLabels 
"label_1(exclusive=true/false),label_2(exclusive=true/false)"``` to add node 
label.
-    * If user don’t specify “(exclusive=…)”, execlusive will be 
```true``` by default.
+    * If user don’t specify “(exclusive=…)”, exclusive will be 
```true``` by default.
     * Run ```yarn cluster --list-node-labels``` to check added node labels are 
visible in the cluster.
 
-* Add labels to nodes
+###Add/modify node-to-labels mapping to YARN
+
+* Configuring nodes to labels mapping in **Centralized** NodeLabel setup
     * Executing ```yarn rmadmin -replaceLabelsOnNode “node1[:port]=label1 
node2=label2”```. Added label1 to node1, label2 to node2. If user don’t 
specify port, it added the label to all ```NodeManagers``` running on the node.
 
-## Configuration of Schedulers for node labels
-### Capacity Scheduler Configuration
+* Configuring nodes to labels mapping in **Distributed** NodeLabel setup
+
+Property  | Value
+----- | ------
+yarn.node-labels.configuration-type | Needs to be set as *"distributed"* in 
RM, to fetch node to labels mapping from a configured Node Labels Provider in 
NM.
+yarn.nodemanager.node-labels.provider | When 
*"yarn.node-labels.configuration-type"* is configured with *"distributed"* in 
RM, Administrators can configure the provider for the node labels by 
configuring this parameter in NM. Administrators can configure *"config"*, 
*"script"* or the *class name* of the provider. Configured  class needs to 
extend 
*org.apache.hadoop.yarn.server.nodemanager.nodelabels.NodeLabelsProvider*. If 
*"config"* is configured, then *"ConfigurationNodeLabelsProvider"* and if 
*"script"* is configured, then *"ScriptNodeLabelsProvider"* will be used.
+yarn.nodemanager.node-labels.resync-interval-ms | Interval at which NM syncs 
its node labels with RM. NM will send its loaded labels every x intervals 
configured, along with heartbeat to RM. This resync is required even when the 
labels are not modified because admin might have removed the cluster label 
which was provided by NM. Default is 2 mins.
+yarn.nodemanager.node-labels.provider.fetch-interval-ms | When 
*"yarn.nodemanager.node-labels.provider"* is configured with *"config"*, 
*"script"* or the *configured class* extends AbstractNodeLabelsProvider, then 
periodically node labels are retrieved from the node labels provider. This 
configuration is to define the interval period. If -1 is configured, then node 
labels are retrieved from provider only during initialization. Defaults to 10 
mins.
+yarn.nodemanager.node-labels.provider.fetch-timeout-ms | When 
*"yarn.nodemanager.node-labels.provider"* is configured with *"script"*, then 
this configuration provides the timeout period after which it will interrupt 
the script which queries the node labels. Defaults to 20 mins.
+yarn.nodemanager.node-labels.provider.script.path | The node label script to 
run. Script output Line starting with *"NODE_PARTITION:"* will be considered as 
node label Partition. In case multiple lines of script output have this 
pattern, then the last one will be considered.
+yarn.nodemanager.node-labels.provider.script.opts | The arguments to pass to 
the node label script.
+yarn.nodemanager.node-labels.provider.configured-node-partition | When 
*"yarn.nodemanager.node-labels.provider"* is configured with *"config"*, then 
ConfigurationNodeLabelsProvider fetches the partition label from this parameter.
+
+* Configuring nodes to labels mapping in **Delegated-Centralized** NodeLabel 
setup
+
+Property  | Value
+----- | ------
+yarn.node-labels.configuration-type | Needs to be set as 
*"delegated-centralized"* to fetch node to labels mapping from a configured 
Node Labels Provider in RM.
+yarn.resourcemanager.node-labels.provider | When 
*"yarn.node-labels.configuration-type"* is configured with 
*"delegated-centralized"*, then administrators should configure the class for 
fetching node labels by ResourceManager. Configured class needs to extend 
*org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsMappingProvider*.
+yarn.resourcemanager.node-labels.provider.fetch-interval-ms | When 
*"yarn.node-labels.configuration-type"* is configured with 
*"delegated-centralized"*, then periodically node labels are retrieved from the 
node labels provider. This configuration is to define the interval. If -1 is 
configured, then node labels are retrieved from provider only once for each 
node after it registers. Defaults to 30 mins.
+
+###Configuration of Schedulers for node labels
+
+* Capacity Scheduler Configuration
+
 Property  | Value
 ----- | ------
 yarn.scheduler.capacity.`<queue-path>`.capacity | Set the percentage of the 
queue can access to nodes belong to DEFAULT partition. The sum of DEFAULT 
capacities for direct children under each parent, must be equal to 100.
@@ -114,27 +164,33 @@ Notes:
 * After finishing configuration of CapacityScheduler, execute ```yarn rmadmin 
-refreshQueues``` to apply changes
 * Go to scheduler page of RM Web UI to check if you have successfully set 
configuration.
 
-# Specifying node label for application
+Specifying node label for application
+-------------------------------------
+
 Applications can use following Java APIs to specify node label to request
 
 * `ApplicationSubmissionContext.setNodeLabelExpression(..)` to set node label 
expression for all containers of the application.
 * `ResourceRequest.setNodeLabelExpression(..)` to set node label expression 
for individual resource requests. This can overwrite node label expression set 
in ApplicationSubmissionContext
 * Specify `setAMContainerResourceRequest.setNodeLabelExpression` in 
`ApplicationSubmissionContext` to indicate expected node label for application 
master container.
 
-# Monitoring
+Monitoring
+----------
+
+###Monitoring through web UI
 
-## Monitoring through web UI
 Following label-related fields can be seen on web UI:
 
 * Nodes page: http://RM-Address:port/cluster/nodes, you can get labels on each 
node
 * Node labels page: http://RM-Address:port/cluster/nodelabels, you can get 
type (exclusive/non-exclusive), number of active node managers, total resource 
of each partition
 * Scheduler page: http://RM-Address:port/cluster/scheduler, you can get 
label-related settings of each queue, and resource usage of queue partitions.
 
-## Monitoring through commandline
+###Monitoring through commandline
 
 * Use `yarn cluster --list-node-labels` to get labels in the cluster
 * Use `yarn node -status <NodeId>` to get node status including labels on a 
given node
 
-# Useful links
+Useful links
+------------
+
 * [YARN Capacity 
Scheduler](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html),
 if you need more understanding about how to configure Capacity Scheduler
 * Write YARN application using node labels, you can see following two links as 
examples: [YARN distributed 
shell](https://issues.apache.org/jira/browse/YARN-2502), [Hadoop 
MapReduce](https://issues.apache.org/jira/browse/MAPREDUCE-6304)

Reply via email to