Repository: hadoop
Updated Branches:
  refs/heads/yarn-native-services 3f7a50d8d -> 281c1d1e8


YARN-7191. Improve yarn-service documentation. Contributed by Jian He


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/281c1d1e
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/281c1d1e
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/281c1d1e

Branch: refs/heads/yarn-native-services
Commit: 281c1d1e87aeea227a7174a72d9957d8af6faf07
Parents: 3f7a50d
Author: Billie Rinaldi <bil...@apache.org>
Authored: Wed Sep 27 15:08:33 2017 -0700
Committer: Billie Rinaldi <bil...@apache.org>
Committed: Wed Sep 27 15:08:33 2017 -0700

----------------------------------------------------------------------
 .../src/site/markdown/yarn-service/Concepts.md  |  47 +---
 .../src/site/markdown/yarn-service/Overview.md  |   3 +-
 .../site/markdown/yarn-service/QuickStart.md    |  34 +--
 .../site/markdown/yarn-service/RegistryDNS.md   | 166 +++++++++++++
 .../markdown/yarn-service/ServiceDiscovery.md   | 235 ++++++++-----------
 5 files changed, 286 insertions(+), 199 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/281c1d1e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Concepts.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Concepts.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Concepts.md
index 7b62c36..e567d03 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Concepts.md
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Concepts.md
@@ -22,6 +22,8 @@ It also does all the heavy lifting work such as resolving the 
service definition
 failed containers, monitoring components' healthiness and readiness, ensuring 
dependency start order across components, flexing up/down components, 
 upgrading components etc. The end goal of the framework is to make sure the 
service is up and running as the state that user desired.
 
+In addition, it leverages a lot of features in YARN core to accomplish 
scheduling constraints, such as
+affinity and anti-affinity scheduling, log aggregation for services, 
automatically restart a container if it fails, and do in-place upgrade of a 
container.
 
 ### A Restful API-Server for deploying/managing services on YARN
 A restful API server is developed to allow users to deploy/manage their 
services on YARN via a simple JSON spec. This avoids users
@@ -34,44 +36,11 @@ support HA, distribute the load etc.
 
 ### Service Discovery
 A DNS server is implemented to enable discovering services on YARN via the 
standard mechanism: DNS lookup.
-The DNS server essentially exposes the information in YARN service registry by 
translating them into DNS records such as A record and SRV record.
-Clients can discover the IPs of containers via standard DNS lookup.
-The previous read mechanisms of YARN Service Registry were limited to a 
registry specific (java) API and a REST interface and are difficult
-to wireup existing clients and services. The DNS based service discovery 
eliminates this gap. Please refer to this [DNS doc](ServiceDiscovery.md) 
-for more details.
-
-### Scheduling
-
-A host of scheduling features are being developed to support long running 
services.
-
-* Affinity and anti-affinity scheduling across containers 
([YARN-6592](https://issues.apache.org/jira/browse/YARN-6592)).
-* Container resizing 
([YARN-1197](https://issues.apache.org/jira/browse/YARN-1197))
-* Special handling of container preemption/reservation for services 
-
-### Container auto-restarts
-
-[YARN-3998](https://issues.apache.org/jira/browse/YARN-3998) implements a 
retry-policy to let NM re-launch a service container when it fails.
-The service REST API provides users a way to enable NodeManager to 
automatically restart the container if it fails.
-The advantage is that it avoids the entire cycle of releasing the failed 
containers, re-asking new containers, re-do resource localizations and so on, 
which
-greatly minimizes container downtime.
 
+The framework posts container information such as hostname and ip into the 
[YARN service registry](../registry/index.md). And the DNS server essentially 
exposes the
+information in YARN service registry by translating them into DNS records such 
as A record and SRV record.
+Clients can then discover the IPs of containers via standard DNS lookup.
 
-### Container in-place upgrade
-
-[YARN-4726](https://issues.apache.org/jira/browse/YARN-4726) aims to support 
upgrading containers in-place, that is, without losing the container 
allocations.
-It opens up a few APIs in NodeManager to allow ApplicationMasters to upgrade 
their containers via a simple API call.
-Under the hood, NodeManager does below steps:
-* Downloading the new resources such as jars, docker container images, new 
configurations.
-* Stop the old container. 
-* Start the new container with the newly downloaded resources. 
-
-At the time of writing this document, core changes are done but the feature is 
not usable end-to-end.
-
-### Resource Profiles
-
-In [YARN-3926](https://issues.apache.org/jira/browse/YARN-3926), YARN 
introduces Resource Profiles which extends the YARN resource model for easier 
-resource-type management and profiles. 
-It primarily solves two problems:
-* Make it easy to support new resource types such as network 
bandwith([YARN-2140](https://issues.apache.org/jira/browse/YARN-2140)), 
disks([YARN-2139](https://issues.apache.org/jira/browse/YARN-2139)).
- Under the hood, it unifies the scheduler codebase to essentially parameterize 
the resource types.
-* User can specify the container resource requirement by a profile name, 
rather than fiddling with varying resource-requirements for each resource type.
+The previous read mechanisms of YARN Service Registry were limited to a 
registry specific (java) API and a REST interface and are difficult
+to wireup existing clients and services. The DNS based service discovery 
eliminates this gap. Please refer to this [Service Discovery 
doc](ServiceDiscovery.md)
+for more details.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/hadoop/blob/281c1d1e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Overview.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Overview.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Overview.md
index 407fbc0..58daee5 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Overview.md
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/Overview.md
@@ -52,7 +52,8 @@ The benefits of combining these workloads are two-fold:
 
 * [Concepts](Concepts.md): Describes the internals of the framework and some 
features in YARN core to support running services on YARN.
 * [Service REST API](YarnServiceAPI.md): The API doc for deploying/managing 
services on YARN.
-* [Service Discovery](ServiceDiscovery.md): Deep dives into the YARN DNS 
internals.
+* [Service Discovery](ServiceDiscovery.md): Descirbes the service discovery 
mechanism on YARN.
+* [Registry DNS](RegistryDNS.md): Deep dives into the Registry DNS internals.
 * [Examples](Examples.md): List some example service definitions (`Yarnfile`).
 
 

http://git-wip-us.apache.org/repos/asf/hadoop/blob/281c1d1e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/QuickStart.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/QuickStart.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/QuickStart.md
index ab415de..15df0cd 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/QuickStart.md
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/QuickStart.md
@@ -194,32 +194,10 @@ If you are building from source code, make sure you use 
`-Pyarn-ui` in the `mvn`
   </property>
 ```
 
-## Service Discovery with YARN DNS
-YARN Service framework comes with a DNS server (backed by YARN Service 
Registry) which enables DNS based discovery of services deployed on YARN.
-That is, user can simply access their services in a well-defined naming format 
as below:
+# Try with Docker
+The above example is only for a non-docker container based service. YARN 
Service Framework also provides first-class support for managing docker based 
services.
+Most of the steps for managing docker based services are the same except that 
in docker the `Artifact` type for a component is `DOCKER` and the Artifact `id` 
is the name of the docker image.
+For details in how to setup docker on YARN, please check [Docker on 
YARN](../DockerContainers.md).
 
-```
-${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}
-```
-For example, in a cluster whose domain name is `yarncluster` (as defined by 
the `hadoop.registry.dns.domain-name` in `yarn-site.xml`), a service named 
`hbase` deployed by user `dev` 
-with two components `hbasemaster` and `regionserver` can be accessed as below:
-
-This URL points to the usual hbase master UI
-```
-http://hbasemaster-0.hbase.dev.yarncluster:16010/master-status
-```
-
-
-Note that YARN service framework assigns COMPONENT_INSTANCE_NAME for each 
container in a sequence of monotonically increasing integers. For example, 
`hbasemaster-0` gets
-assigned `0` since it is the first and only instance for the `hbasemaster` 
component. In case of `regionserver` component, it can have multiple containers
- and so be named as such: `regionserver-0`, `regionserver-1`, `regionserver-2` 
... etc 
- 
-`Disclaimer`: The DNS implementation is still experimental. It should not be 
used as a fully-functional corporate DNS. 
-
-### Start the DNS server 
-By default, the DNS runs on non-privileged port `5353`.
-If it is configured to use the standard privileged port `53`, the DNS server 
needs to be run as root:
-```
-sudo su - -c "yarn org.apache.hadoop.registry.server.dns.RegistryDNSServer > 
/${HADOOP_LOG_FOLDER}/registryDNS.log 2>&1 &" root
-```
-Please refer to [YARN DNS doc](ServicesDiscovery.md) for the full list of 
configurations.
\ No newline at end of file
+With docker support, it also opens up a set of new possibilities to implement 
features such as discovering service containers on YARN with DNS.
+Check [ServiceDiscovery](ServiceDiscovery.md) for more details.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/281c1d1e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/RegistryDNS.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/RegistryDNS.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/RegistryDNS.md
new file mode 100644
index 0000000..ef395fc
--- /dev/null
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/RegistryDNS.md
@@ -0,0 +1,166 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Registry DNS Server
+
+<!-- MACRO{toc|fromDepth=0|toDepth=3} -->
+
+## Introduction
+
+The Registry DNS Server provides a standard DNS interface to the information 
posted into the YARN Registry by deployed applications. The DNS service serves 
the following functions:
+
+1. **Exposing existing service-discovery information via DNS** - Information 
provided in
+the current YARN service registry’s records will be converted into DNS 
entries, thus
+allowing users to discover information about YARN applications using standard 
DNS
+client mechanisms (e.g. a DNS SRV Record specifying the hostname and port
+number for services).
+2. **Enabling Container to IP mappings** - Enables discovery of the IPs of 
containers via
+standard DNS lookups. Given the availability of the records via DNS, container
+name-based communication will be facilitated (e.g. `curl
+http://solr-0.solr-service.devuser.yarncluster:8983/solr/admin/collections?action=LIST`).
+
+## Service Properties
+
+The existing YARN Service Registry is leveraged as the source of information 
for the DNS Service.
+
+The following core functions are supported by the DNS-Server:
+
+### Functional properties
+
+1. Supports creation of DNS records for end-points of the deployed YARN 
applications
+2. Record names remain unchanged during restart of containers and/or 
applications
+3. Supports reverse lookups (name based on IP). Note, this works only for
+Docker containers because other containers share the IP of the host
+4. Supports security using the standards defined by The Domain Name System 
Security
+Extensions (DNSSEC)
+5. Highly available
+6. Scalable - The service provides the responsiveness (e.g. low-latency) 
required to
+respond to DNS queries (timeouts yield attempts to invoke other configured name
+servers).
+
+### Deployment properties
+
+1. Supports integration with existing DNS assets (e.g. a corporate DNS server) 
by acting as
+a DNS server for a Hadoop cluster zone/domain. The server is not intended to 
act as a
+primary DNS server and does not forward requests to other servers. Rather, a
+primary DNS server can be configured to forward a zone to the registry DNS
+server.
+2. The DNS Server exposes a port that can receive both TCP and UDP requests per
+DNS standards. The default port for DNS protocols is not in the restricted
+range (5353). However, existing DNS assets may only allow zone forwarding to
+non-custom ports. To support this, the registry DNS server can be started in
+privileged mode.
+
+## DNS Record Name Structure
+
+The DNS names of generated records are composed from the following elements
+(labels). Note that these elements must be compatible with DNS conventions
+(see “Preferred Name Syntax” in [RFC 
1035](https://www.ietf.org/rfc/rfc1035.txt)):
+
+* **domain** - the name of the cluster DNS domain. This name is provided as a
+configuration property. In addition, it is this name that is configured at a 
parent DNS
+server as the zone name for the defined registry DNS zone (the zone for which
+the parent DNS server will forward requests to registry DNS). E.g. 
yarncluster.com
+* **username** - the name of the application deployer. This name is the simple 
short-name (for
+e.g. the primary component of the Kerberos principal) associated with the user 
launching
+the application. As the username is one of the elements of DNS names, it is 
expected
+that this also conforms to DNS name conventions (RFC 1035 linked above), so it
+is converted to a  valid DNS hostname entries using the punycode convention 
used
+for internationalized DNS.
+* **application name** - the name of the deployed YARN application. This name 
is inferred
+from the YARN registry path to the application's node. Application name,
+rather than application id, was chosen as a way of making it easy for users to 
refer to human-readable DNS
+names. This obviously mandates certain uniqueness properties on application 
names.
+* **container id** - the YARN assigned ID to a container (e.g.
+container_e3741_1454001598828_01_000004)
+* **component name** - the name assigned to the deployed component (for e.g. a 
master
+component). A component is a distributed element of an application or service 
that is
+launched in a YARN container (e.g. an HBase master). One can imagine multiple
+components within an application. A component name is not yet a first class 
concept in
+YARN, but is a very useful one that we are introducing here for the sake of 
registry DNS
+entries. Many frameworks like MapReduce, Slider already have component names
+(though, as mentioned, they are not yet supported in YARN in a first class 
fashion).
+* **api** - the api designation for the exposed endpoint
+
+### Notes about DNS Names
+
+* In most instances, the DNS names can be easily distinguished by the number of
+elements/labels that compose the name. The cluster’s domain name is always 
the last
+element. After that element is parsed out, reading from right to left, the 
first element
+maps to the application user and so on. Wherever it is not easily 
distinguishable, naming conventions are used to disambiguate the name using a 
prefix such as
+“container” or suffix such as “api”. For example, an endpoint 
published as a
+management endpoint will be referenced with the name 
*management-api.griduser.yarncluster.com*.
+* Unique application name (per user) is not currently supported/guaranteed by 
YARN, but
+it is supported by frameworks such as Apache Slider. The registry DNS service 
currently
+leverages the last element of the ZK path entry for the application as an
+application name. These application names have to be unique for a given user.
+
+## DNS Server Functionality
+
+The primary functions of the DNS service are illustrated in the following 
diagram:
+
+![DNS Functional Overview](../images/dns_overview.png "DNS Functional 
Overview")
+
+### DNS record creation
+The following figure illustrates at slightly greater detail the DNS record 
creation and registration sequence (NOTE: service record updates would follow a 
similar sequence of steps,
+distinguished only by the different event type):
+
+![DNS Functional Overview](../images/dns_record_creation.jpeg "DNS Functional 
Overview")
+
+### DNS record removal
+Similarly, record removal follows a similar sequence
+
+![DNS Functional Overview](../images/dns_record_removal.jpeg "DNS Functional 
Overview")
+
+(NOTE: The DNS Zone requires a record as an argument for the deletion method, 
thus
+requiring similar parsing logic to identify the specific records that should 
be removed).
+
+### DNS Service initialization
+* The DNS service initializes both UDP and TCP listeners on a configured port.
+If a port in the restricted range is desired (such as the standard DNS port
+53), the DNS service can be launched using jsvc as described in the section
+on starting the DNS server.
+* Subsequently, the DNS service listens for inbound DNS requests. Those 
requests are
+standard DNS requests from users or other DNS servers (for example, DNS 
servers that have the
+RegistryDNS service configured as a forwarder).
+
+## Start the DNS Server
+By default, the DNS server runs on non-privileged port `5353`. Start the server
+with:
+```
+yarn --daemon start registrydns
+```
+
+If the DNS server is configured to use the standard privileged port `53`, the
+environment variables YARN\_REGISTRYDNS\_SECURE\_USER and
+YARN\_REGISTRYDNS\_SECURE\_EXTRA\_OPTS must be uncommented in the yarn-env.sh
+file. The DNS server should then be launched as root and jsvc will be used to
+reduce the privileges of the daemon after the port has been bound.
+
+## Configuration
+The Registry DNS server reads its configuration properties from the 
yarn-site.xml file.  The following are the DNS associated configuration 
properties:
+
+| Name | Description |
+| ------------ | ------------- |
+| hadoop.registry.dns.enabled | The DNS functionality is enabled for the 
cluster. Default is false. |
+| hadoop.registry.dns.domain-name  | The domain name for Hadoop cluster 
associated records.  |
+| hadoop.registry.dns.bind-address | Address associated with the network 
interface to which the DNS listener should bind.  |
+| hadoop.registry.dns.bind-port | The port number for the DNS listener. The 
default port is 5353.  |
+| hadoop.registry.dns.dnssec.enabled | Indicates whether the DNSSEC support is 
enabled. Default is false.  |
+| hadoop.registry.dns.public-key  | The base64 representation of the 
server’s public key. Leveraged for creating the DNSKEY Record provided for 
DNSSEC client requests.  |
+| hadoop.registry.dns.private-key-file  | The path to the standard DNSSEC 
private key file. Must only be readable by the DNS launching identity. See 
[dnssec-keygen](https://ftp.isc.org/isc/bind/cur/9.9/doc/arm/man.dnssec-keygen.html)
 documentation.  |
+| hadoop.registry.dns-ttl | The default TTL value to associate with DNS 
records. The default value is set to 1 (a value of 0 has undefined behavior). A 
typical value should be approximate to the time it takes YARN to restart a 
failed container.  |
+| hadoop.registry.dns.zone-subnet  | An indicator of the IP range associated 
with the cluster containers. The setting is utilized for the generation of the 
reverse zone name.  |
+| hadoop.registry.dns.zone-mask | The network mask associated with the zone IP 
range.  If specified, it is utilized to ascertain the IP range possible and 
come up with an appropriate reverse zone name. |
+| hadoop.registry.dns.zones-dir | A directory containing zone configuration 
files to read during zone initialization.  This directory can contain zone 
master files named *zone-name.zone*.  See 
[here](http://www.zytrax.com/books/dns/ch6/mydomain.html) for zone master file 
documentation.|
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/hadoop/blob/281c1d1e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/ServiceDiscovery.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/ServiceDiscovery.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/ServiceDiscovery.md
index 6318a07..a5dd0d2 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/ServiceDiscovery.md
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/ServiceDiscovery.md
@@ -12,139 +12,112 @@
   limitations under the License. See accompanying LICENSE file.
 -->
 
-# YARN DNS Server
-
-<!-- MACRO{toc|fromDepth=0|toDepth=3} -->
-
-## Introduction
-
-The YARN DNS Server provides a standard DNS interface to the information 
posted into the YARN Registry by deployed applications. The DNS service serves 
the following functions:
-
-1. **Exposing existing service-discovery information via DNS** - Information 
provided in
-the current YARN service registry’s records will be converted into DNS 
entries, thus
-allowing users to discover information about YARN applications using standard 
DNS
-client mechanisms (for e.g. a DNS SRV Record specifying the hostname and port
-number for services).
-2. **Enabling Container to IP mappings** - Enables discovery of the IPs of 
containers via
-standard DNS lookups. Given the availability of the records via DNS, container
-name-based communication will be facilitated (e.g. ‘curl
-http://myContainer.myDomain.com/endpoint’).
-
-## Service Properties
-
-The existing YARN Service Registry is leveraged as the source of information 
for the DNS Service.
-
-The following core functions are supported by the DNS-Server:
-
-### Functional properties
-
-1. Supports creation of DNS records for end-points of the deployed YARN 
applications
-2. Record names remain unchanged during restart of containers and/or 
applications
-3. Supports reverse lookups (name based on IP). Note, this works only for 
Docker containers.
-4. Supports security using the standards defined by The Domain Name System 
Security
-Extensions (DNSSEC)
-5. Highly available
-6. Scalable - The service provides the responsiveness (e.g. low-latency) 
required to
-respond to DNS queries (timeouts yield attempts to invoke other configured name
-servers).
-
-### Deployment properties
-
-1. Supports integration with existing DNS assets (e.g. a corporate DNS server) 
by acting as
-a DNS server for a Hadoop cluster zone/domain. The server is not intended to 
act as a
-primary DNS server and does not forward requests to other servers.
-2. The DNS Server exposes a port that can receive both TCP and UDP requests per
-DNS standards. The default port for DNS protocols is in a restricted, 
administrative port
-range (5353), so the port is configurable for deployments in which the service 
may
-not be managed via an administrative account.
-
-## DNS Record Name Structure
-
-The DNS names of generated records are composed from the following elements 
(labels). Note that these elements must be compatible with DNS conventions (see 
“Preferred Name Syntax” in RFC 1035):
-
-* **domain** - the name of the cluster DNS domain. This name is provided as a
-configuration property. In addition, it is this name that is configured at a 
parent DNS
-server as the zone name for the defined yDNS zone (the zone for which the 
parent DNS
-server will forward requests to yDNS). E.g. yarncluster.com
-* **username** - the name of the application deployer. This name is the simple 
short-name (for
-e.g. the primary component of the Kerberos principal) associated with the user 
launching
-the application. As the username is one of the elements of DNS names, it is 
expected
-that this also confirms DNS name conventions (RFC 1035 linked above), so 
special translation is performed for names with special characters like hyphens 
and spaces.
-* **application name** - the name of the deployed YARN application. This name 
is inferred
-from the YARN registry path to the application's node. Application name, 
rather thn application id, was chosen as a way of making it easy for users to 
refer to human-readable DNS
-names. This obviously mandates certain uniqueness properties on application 
names.
-* **container id** - the YARN assigned ID to a container (e.g.
-container_e3741_1454001598828_01_000004)
-* **component name** - the name assigned to the deployed component (for e.g. a 
master
-component). A component is a distributed element of an application or service 
that is
-launched in a YARN container (e.g. an HBase master). One can imagine multiple
-components within an application. A component name is not yet a first class 
concept in
-YARN, but is a very useful one that we are introducing here for the sake of 
yDNS
-entries. Many frameworks like MapReduce, Slider already have component names
-(though, as mentioned, they are not yet supported in YARN in a first class 
fashion).
-* **api** - the api designation for the exposed endpoint
-
-### Notes about DNS Names
-
-* In most instances, the DNS names can be easily distinguished by the number of
-elements/labels that compose the name. The cluster’s domain name is always 
the last
-element. After that element is parsed out, reading from right to left, the 
first element
-maps to the application user and so on. Wherever it is not easily 
distinguishable, naming conventions are used to disambiguate the name using a 
prefix such as
-“container” or suffix such as “api”. For example, an endpoint 
published as a
-management endpoint will be referenced with the name 
*management-api.griduser.yarncluster.com*.
-* Unique application name (per user) is not currently supported/guaranteed by 
YARN, but
-it is supported by frameworks such as Apache Slider. The yDNS service currently
-leverages the last element of the ZK path entry for the application as an
-application name. These application names have to be unique for a given user.
-
-## DNS Server Functionality
-
-The primary functions of the DNS service are illustrated in the following 
diagram:
-
-![DNS Functional Overview](../images/dns_overview.png "DNS Functional 
Overview")
-
-### DNS record creation
-The following figure illustrates at slightly greater detail the DNS record 
creation and registration sequence (NOTE: service record updates would follow a 
similar sequence of steps,
-distinguished only by the different event type):
-
-![DNS Functional Overview](../images/dns_record_creation.jpeg "DNS Functional 
Overview")
-
-### DNS record removal
-Similarly, record removal follows a similar sequence
-
-![DNS Functional Overview](../images/dns_record_removal.jpeg "DNS Functional 
Overview")
-
-(NOTE: The DNS Zone requires a record as an argument for the deletion method, 
thus
-requiring similar parsing logic to identify the specific records that should 
be removed).
-
-### DNS Service initialization
-* The DNS service initializes both UDP and TCP listeners on a configured port. 
As
-noted above, the default port of 5353 is in a restricted range that is only 
accessible to an
-account with administrative privileges.
-* Subsequently, the DNS service listens for inbound DNS requests. Those 
requests are
-standard DNS requests from users or other DNS servers (for example, DNS 
servers that have the
-YARN DNS service configured as a forwarder).
+# Service Discovery
+
+This document describes the mechanism of service discovery on YARN and the
+steps for enabling it.
+
+## Overview
+A [DNS server](RegistryDNS.md) is implemented to enable discovering services 
on YARN via
+the standard mechanism: DNS lookup.
+
+The framework ApplicationMaster posts the container information such as 
hostname and IP address into
+the YARN service registry. The DNS server exposes the information in YARN 
service registry by translating them into DNS
+records such as A record and SRV record. Clients can then discover the IPs of 
containers via standard DNS lookup.
+
+For non-docker containers (containers with null `Artifact` or with `Artifact` 
type set to `TARBALL`), since all containers on the same host share the same ip 
address,
+the DNS supports forward DNS lookup, but not support reverse DNS lookup.
+With docker, it supports both forward and reverse lookup, since each container
+can be configured to have its own unique IP. In addition, the DNS also 
supports configuring static zone files for both foward and reverse lookup.
+
+## Docker Container IP Management in Cluster
+To support the use-case of per container per IP, containers must be launched 
with `bridge` network. However, with `bridge` network, containers
+running on one node are not routable from other nodes by default. This is not 
an issue if you are only doing single node testing, however, for
+a multi-node environment, containers must be made routable from other nodes.
+
+There are several approaches to solve this depending on the platforms like GCE 
or AWS. Please refer to specific platform documentations for how to enable this.
+For on-prem cluster, one way to solve this issue is, on each node, configure 
the docker daemon to use a custom bridge say `br0` which is routable from all 
nodes.
+Also, assign an exclusive, contiguous range of IP addresses expressed in CIDR 
form e.g `172.21.195.240/26 (64 IPs)` to each docker
+daemon using the `fixed-cidr` option like  below in the docker `daemon.json`:
+```
+"bridge": "br0"
+"fixed-cidr": "172.21.195.240/26"
+```
+Check how to [customize docker bridge 
network](https://docs.docker.com/engine/userguide/networking/default_network/custom-docker0/)
 for details.
+
+
+## Naming Convention with Registry DNS
+With the DNS support, user can simply access their services in a well-defined 
naming format as below:
+
+```
+${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}
+```
+For example, in a cluster whose domain name is `yarncluster` (as defined by 
the `hadoop.registry.dns.domain-name` in `yarn-site.xml`), a service named 
`hbase` deployed by user `devuser`
+with two components `hbasemaster` and `regionserver` can be accessed as below:
+
+This URL points to the usual hbase master UI
+```
+http://hbasemaster-0.hbase.devuser.yarncluster:16010/master-status
+```
+
+
+Note that YARN service framework assigns `COMPONENT_INSTANCE_NAME` for each 
container in a sequence of monotonically increasing integers. For example, 
`hbasemaster-0` gets
+assigned `0` since it is the first and only instance for the `hbasemaster` 
component. In case of `regionserver` component, it can have multiple containers
+ and so be named as such: `regionserver-0`, `regionserver-1`, `regionserver-2` 
... etc
+
+`Disclaimer`: The DNS implementation is still experimental. It should not be 
used as a fully-functional DNS.
+
+
+## Configure Registry DNS
+
+Below is the set of configurations in `yarn-site.xml` required for enabling 
Registry DNS. A full list of properties can be found in the Configuration
+section of [Registry DNS](RegistryDNS.md).
+
+```
+  <property>
+    <description>The domain name for Hadoop cluster associated 
records.</description>
+    <name>hadoop.registry.dns.domain-name</name>
+    <value>ycluster</value>
+  </property>
+
+  <property>
+    <description>The port number for the DNS listener. The default port is 
5353.
+    If the standard privileged port 53 is used, make sure start the DNS with 
jsvc support.</description>
+    <name>hadoop.registry.dns.bind-port</name>
+    <value>53</value>
+  </property>
+
+  <property>
+    <description>The DNS functionality is enabled for the cluster. Default is 
false.</description>
+    <name>hadoop.registry.dns.enabled</name>
+    <value>true</value>
+  </property>
+
+  <property>
+    <description>The network mask associated with the zone IP range. If 
specified, it is utilized to ascertain the
+    IP range possible and come up with an appropriate reverse zone 
name.</description>
+    <name>hadoop.registry.dns.zone-mask</name>
+    <value>255.255.255.0</value>
+  </property>
+
+  <property>
+    <description>An indicator of the IP range associated with the cluster 
containers. The setting is utilized for the
+     generation of the reverse zone name.</description>
+    <name>hadoop.registry.dns.zone-subnet</name>
+    <value>172.17.0</value>
+  </property>
+
+```
 
 ## Start the DNS Server
-By default, the DNS runs on non-privileged port `5353`.
-If it is configured to use the standard privileged port `53`, the DNS server 
needs to be run as root:
+By default, the DNS server runs on non-privileged port `5353`. Start the server
+with:
 ```
-sudo su - -c "yarn org.apache.hadoop.registry.server.dns.RegistryDNSServer > 
/${HADOOP_LOG_FOLDER}/registryDNS.log 2>&1 &" root
+yarn --daemon start registrydns
 ```
 
-## Configuration
-The YARN DNS server reads its configuration properties from the yarn-site.xml 
file.  The following are the DNS associated configuration properties:
-
-| Name | Description |
-| ------------ | ------------- |
-| hadoop.registry.dns.enabled | The DNS functionality is enabled for the 
cluster. Default is false. |
-| hadoop.registry.dns.domain-name  | The domain name for Hadoop cluster 
associated records.  |
-| hadoop.registry.dns.bind-address | Address associated with the network 
interface to which the DNS listener should bind.  |
-| hadoop.registry.dns.bind-port | The port number for the DNS listener. The 
default port is 5353. However, since that port falls in a administrator-only 
range, typical deployments may need to specify an alternate port.  |
-| hadoop.registry.dns.dnssec.enabled | Indicates whether the DNSSEC support is 
enabled. Default is false.  |
-| hadoop.registry.dns.public-key  | The base64 representation of the 
server’s public key. Leveraged for creating the DNSKEY Record provided for 
DNSSEC client requests.  |
-| hadoop.registry.dns.private-key-file  | The path to the standard DNSSEC 
private key file. Must only be readable by the DNS launching identity. See 
[dnssec-keygen](https://ftp.isc.org/isc/bind/cur/9.9/doc/arm/man.dnssec-keygen.html)
 documentation.  |
-| hadoop.registry.dns-ttl | The default TTL value to associate with DNS 
records. The default value is set to 1 (a value of 0 has undefined behavior). A 
typical value should be approximate to the time it takes YARN to restart a 
failed container.  |
-| hadoop.registry.dns.zone-subnet  | An indicator of the IP range associated 
with the cluster containers. The setting is utilized for the generation of the 
reverse zone name.  |
-| hadoop.registry.dns.zone-mask | The network mask associated with the zone IP 
range.  If specified, it is utilized to ascertain the IP range possible and 
come up with an appropriate reverse zone name. |
-| hadoop.registry.dns.zones-dir | A directory containing zone configuration 
files to read during zone initialization.  This directory can contain zone 
master files named *zone-name.zone*.  See 
[here](http://www.zytrax.com/books/dns/ch6/mydomain.html) for zone master file 
documentation.|
+If the DNS server is configured to use the standard privileged port `53`, the
+environment variables `YARN_REGISTRYDNS_SECURE_USER` and
+`YARN_REGISTRYDNS_SECURE_EXTRA_OPTS` must be uncommented in the `yarn-env.sh`
+file. The DNS server should then be launched as `root` and jsvc will be used to
+reduce the privileges of the daemon after the port has been bound.


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-commits-h...@hadoop.apache.org

Reply via email to