Author: stevel
Date: Wed Mar 25 17:59:26 2015
New Revision: 1669188
URL: http://svn.apache.org/r1669188
Log:
SLIDER-824 Update documentation with new placement configuration options
Added:
incubator/slider/site/trunk/content/developing/chaosmonkey.md
- copied unchanged from r1668701,
incubator/slider/site/trunk/content/docs/slider_specs/chaosmonkey.md
incubator/slider/site/trunk/content/docs/configuration/appconfig.html.md
incubator/slider/site/trunk/content/docs/configuration/internal.md
incubator/slider/site/trunk/content/docs/configuration/resource_specification.md
- copied, changed from r1668701,
incubator/slider/site/trunk/content/docs/slider_specs/resource_specification.md
incubator/slider/site/trunk/content/docs/configuration/revision-1/
incubator/slider/site/trunk/content/docs/configuration/revision-1/index.md
- copied, changed from r1660946,
incubator/slider/site/trunk/content/docs/configuration/index.md
incubator/slider/site/trunk/content/docs/configuration/revision-1/original-hbase.json
- copied unchanged from r1660946,
incubator/slider/site/trunk/content/docs/configuration/original-hbase.json
incubator/slider/site/trunk/content/docs/configuration/revision-1/proposed-hbase.json
- copied unchanged from r1660946,
incubator/slider/site/trunk/content/docs/configuration/proposed-hbase.json
incubator/slider/site/trunk/content/docs/configuration/revision-1/redesign.md
- copied unchanged from r1660946,
incubator/slider/site/trunk/content/docs/configuration/redesign.md
incubator/slider/site/trunk/content/docs/configuration/revision-1/specification.md
- copied unchanged from r1660946,
incubator/slider/site/trunk/content/docs/configuration/specification.md
Removed:
incubator/slider/site/trunk/content/docs/configuration/original-hbase.json
incubator/slider/site/trunk/content/docs/configuration/proposed-hbase.json
incubator/slider/site/trunk/content/docs/configuration/redesign.md
incubator/slider/site/trunk/content/docs/configuration/specification.md
incubator/slider/site/trunk/content/docs/slider_specs/chaosmonkey.md
incubator/slider/site/trunk/content/docs/slider_specs/resource_specification.md
Modified:
incubator/slider/site/trunk/content/design/rolehistory.md
incubator/slider/site/trunk/content/developing/index.md
incubator/slider/site/trunk/content/docs/client-configuration.md
incubator/slider/site/trunk/content/docs/configuration/core.md
incubator/slider/site/trunk/content/docs/configuration/index.md
incubator/slider/site/trunk/content/docs/examples.md
incubator/slider/site/trunk/content/docs/getting_started.md
incubator/slider/site/trunk/content/docs/high_availability.md
incubator/slider/site/trunk/content/docs/security.md
incubator/slider/site/trunk/content/docs/slider_specs/application_instance_configuration.md
incubator/slider/site/trunk/content/docs/slider_specs/creating_app_definitions.md
incubator/slider/site/trunk/content/docs/slider_specs/hello_world_slider_app.md
incubator/slider/site/trunk/content/docs/slider_specs/index.md
incubator/slider/site/trunk/content/docs/slider_specs/specifying_exports.md
incubator/slider/site/trunk/content/docs/ssl.md
Modified: incubator/slider/site/trunk/content/design/rolehistory.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/rolehistory.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/design/rolehistory.md (original)
+++ incubator/slider/site/trunk/content/design/rolehistory.md Wed Mar 25
17:59:26 2015
@@ -35,20 +35,23 @@ that have reached their escalation timeo
1. Such requests are cancelled and "relaxed" requests re-issued.
1. Labels are always respected; even relaxed requests use any labels specified
in `resources.json`
1. If a node is considered unreliable (as per-the slider 0.70 changes), it is
not used in the initial
-request.
+request. YARN may still allocate relaxed instances on such nodes. That is:
there is no explicit
+blacklisting, merely deliberate exclusion of unreliable nodes from explicitly
placed requests.
-#### `strict` placement
+#### Placement policies
+
+`strict` placement
Again, "strict placement" has a different policy: once a component has been
deployed on a node,
one component request will be made against that node, even if it is considered
unreliable. No
relaxation of the request will ever take place.
-#### `none` placement
+`none` placement
If the placement policy is "none", the request will always be relaxed.
While tracking of recent failure counts takes place, it is not used in
placement requests.
-#### `anti-affine` placement
+`anti-affine` placement
There's still no explicit support for this in YARN or slider. As noted above,
Slider does
try to spread placement when rebuilding an application, but otherwise it
accepts which
Modified: incubator/slider/site/trunk/content/developing/index.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/developing/index.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/developing/index.md (original)
+++ incubator/slider/site/trunk/content/developing/index.md Wed Mar 25 17:59:26
2015
@@ -33,6 +33,7 @@ Slider
* [Submitting Patches](submitting_patches.html)
* [Windows Development and Testing](windows.html)
* [Demo Script](demo.html)
+* [Configuring the Slider Chaos Monkey](chaosmonkey.html)
## Historical Documents
Modified: incubator/slider/site/trunk/content/docs/client-configuration.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/client-configuration.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/docs/client-configuration.md (original)
+++ incubator/slider/site/trunk/content/docs/client-configuration.md Wed Mar 25
17:59:26 2015
@@ -328,4 +328,4 @@ and,
2. in a secure cluster, the security flag (`slider.security.enabled`)
and the HDFS Kerberos principal.
-3. The yarn registry options.
+3. The YARN registry options.
Added: incubator/slider/site/trunk/content/docs/configuration/appconfig.html.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/appconfig.html.md?rev=1669188&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/docs/configuration/appconfig.html.md
(added)
+++ incubator/slider/site/trunk/content/docs/configuration/appconfig.html.md
Wed Mar 25 17:59:26 2015
@@ -0,0 +1,16 @@
+<!---
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+-->
+
+# title
+
\ No newline at end of file
Modified: incubator/slider/site/trunk/content/docs/configuration/core.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/core.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/docs/configuration/core.md (original)
+++ incubator/slider/site/trunk/content/docs/configuration/core.md Wed Mar 25
17:59:26 2015
@@ -15,7 +15,7 @@
limitations under the License.
-->
-# Apache Slider Core Configuration Specification
+# Apache Slider Core Configuration Specification, version 2.0
## Terminology
@@ -45,6 +45,9 @@ and what their resource requirements are
size of the application in terms of its component requirements: how many,
and what their resource requirements are.
+*`internal.json`*: A file which contains Slider's internal configuration
+parameters.
+
## Structure
Configurations are stored in well-formed JSON files.
@@ -63,9 +66,9 @@ The JSON specification files all have a
1. A component section, `/components`.
1. 0 or more sections under `/components` for each component, identified by
component name,
containing string properties.
-1. 0 or 1 section `/metadata` containing arbitrary metadata (such as a
description,
+1. An optional section `/metadata` containing arbitrary metadata (such as a
description,
author, or any other information that is not parsed or processed directly).
-
+1. An optional section, `/credentials` containing security information.
The simplest valid specification file is
@@ -288,37 +291,6 @@ the master component, using 1 vcore and
each using one vcore and 512 MB of RAM.
-## Internal information, `internal.json`
-
-This contains internal data related to the deployment -it is not
-intended for manual editing.
-
-There MAY be a component, `diagnostics`. If defined, its content contains
-diagnostic information for support calls, and MUST NOT be interpreted
-during application deployment, (though it may be included in the generation
-of diagnostics reports)
-
-
- {
- "schema": "http://example.org/specification/v2.0.0",
-
- "metadata": {
- "description": "Internal configuration DO NOT EDIT"
- },
- "global": {
- "name": "small_cluster",
- "application": "hdfs://cluster:8020/apps/hbase/v/1.0.0/application.tar"
- },
- "components": {
-
- "diagnostics": {
- "create.hadoop.deployed.info": "(release-2.3.0) @dfe463",
- "create.hadoop.build.info": "2.3.0",
- "create.time.millis": "1393512091276",
- "create.time": "27 Feb 2014 14:41:31 GMT"
- }
- }
- }
## Deployment specification: `app_configuration.json`
@@ -351,6 +323,8 @@ application, and instances of the indivi
"jvm.heapsize": "512M"
}
}
+ "credentials" {
+ }
}
The resolved specification defines the values that are passed to the
@@ -397,6 +371,8 @@ different components.
"jvm.heapsize": "512M"
}
}
+ "credentials" {
+ }
}
The `site.` properties have been passed down to each component, components
@@ -407,17 +383,29 @@ there is no way to declare an attribute
of the author of the configuration file (and their tools) to detect such
issues.
### Key Application Configuration Items
-The following sections provides details about certain application
configuration properties that can be utilized to tailor the deployment of a
given application:
+
+The following sections provides details about certain application configuration
+ properties that can be utilized to tailor the deployment of a given
application:
#### Controlling assigned port ranges
-For certain deployments, the ports available for communication with clients
(Web UI ports, RPC ports, etc) are restricted to a specific set (e.g when
leveraging a firewall). In those situations you can designate the set of
allowed ports with the "site.global.slider.allowed.ports" setting. This
settings takes a comma-delimited set of port numbers and ranges, e.g.:
+
+For certain deployments, the ports available for communication with clients
+(Web UI ports, RPC ports, etc) are restricted to a specific set (e.g when
using a firewall).
+In those situations you can designate the set of allowed ports with the
+`site.global.slider.allowed.ports` setting.
+
+This takes a comma-delimited set of port numbers and ranges, e.g.:
"site.global.slider.allowed.ports": "48000, 49000, 50001-50010"
-
-The AM exposed ports (Web UI, RPC), as well as the ports allocated to launched
application containers, will be limited to the ranges specified by the property
value.
+
+The AM exposed ports (Web UI, RPC), as well as the ports allocated to launched
+application containers, will be limited to the ranges specified by the
property value.
#### Delaying container launch
-In situations where container restarts may need to be delayed to allow for
platform resources to be released (e.g. a port assigned to a container that is
stopped may be slow to release), a delay can be designated by setting the
"container.launch.delay.sec" property in the component's configuration section:
+
+In situations where container restarts may need to be delayed to allow for
+platform resources to be released (e.g. a port assigned to a previous container
+may be slow to release), a delay can be designated by setting the
`container.launch.delay.sec` property.
"worker": {
"jvm.heapsite": "512M",
@@ -425,14 +413,20 @@ In situations where container restarts m
}
#### Specifying the Python Executable Path
-Currently the Slider containers leverage python for component scripts (the
scripts responsible for component lifecycle operations). When deploying
applications on certain variations of linux or other operating systems (e.g.
Centos 5) , the version of python on the system path may be incompatible with
the component script (e.g. methods or imports utilized are not available). In
those circumstances the path to the python executable for container script
execution can be specified by the "agent.python.exec.path" property:
+
+Slider containers use python for component scripts in the containers.
+When deploying applications on certain variations of linux or other operating
systems (e.g. Centos 5),
+the version of python on the system PATH may be incompatible with the
component script
+In those circumstances the path to the python executable for container script
execution can be
+specified by the `agent.python.exec.path` property:
"global": {
"agent.python.exec.path": "/usr/bin/python",
. . .
}
-This property may also be specified in the slider-client.xml file (typically
in the "conf" directory of the slider installation) if the python version
specified is to be utilized across multiple deployments:
+This property may also be specified in the `slider-client.xml` file (typically
in the "conf" directory
+of the slider installation) if the python version specified is to be utilized
across multiple deployments:
<property>
<name>agent.python.exec.path</name>
Modified: incubator/slider/site/trunk/content/docs/configuration/index.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/index.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/docs/configuration/index.md (original)
+++ incubator/slider/site/trunk/content/docs/configuration/index.md Wed Mar 25
17:59:26 2015
@@ -25,15 +25,8 @@ requirements.
1. The dynamic description of the running application, including information
on the location of components and aggregated statistics.
-The specifics of this are covered in the [Core Configuration
Specification](core.html)
-
-## Historical References
-
-1. [Specification](specification.html)
-1. [Redesign](redesign.html)
-
-
-1. [Example: current](original-hbase.json)
-1. [Example: proposed](proposed-hbase.json)
+* [Core Configuration Specification](core.html)
+* [internal.json](internal.html)
+* [resources.json](resources.html)
Added: incubator/slider/site/trunk/content/docs/configuration/internal.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/internal.md?rev=1669188&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/docs/configuration/internal.md (added)
+++ incubator/slider/site/trunk/content/docs/configuration/internal.md Wed Mar
25 17:59:26 2015
@@ -0,0 +1,55 @@
+<!---
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+-->
+
+# internal.json: slider's internal configuration document
+
+
+## Internal information, `internal.json`
+
+This contains internal data related to the deployment âit is not
+intended for normal use.
+
+There MAY be a component, `diagnostics`. If defined, its content contains
+diagnostic information for support calls, and MUST NOT be interpreted
+during application deployment, (though it may be included in the generation
+of diagnostics reports)
+
+
+ {
+ "schema": "http://example.org/specification/v2.0.0",
+
+ "metadata": {
+ "description": "Internal configuration DO NOT EDIT"
+ },
+ "global": {
+ "name": "small_cluster",
+ "application": "hdfs://cluster:8020/apps/hbase/v/1.0.0/application.tar"
+ },
+ "components": {
+
+ "diagnostics": {
+ "create.hadoop.deployed.info": "(release-2.3.0) @dfe463",
+ "create.hadoop.build.info": "2.3.0",
+ "create.time.millis": "1393512091276",
+ "create.time": "27 Feb 2014 14:41:31 GMT"
+ }
+ }
+ }
+
+## Chaos Monkey
+
+The Slider application has a built in "Chaos Monkey", which is configured in
the `internal.json`
+file.:
+
+Consult ["configuring the Slider Chaos
Monkey"](../developing/chaosmonkey.html) for details.
Copied:
incubator/slider/site/trunk/content/docs/configuration/resource_specification.md
(from r1668701,
incubator/slider/site/trunk/content/docs/slider_specs/resource_specification.md)
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/resource_specification.md?p2=incubator/slider/site/trunk/content/docs/configuration/resource_specification.md&p1=incubator/slider/site/trunk/content/docs/slider_specs/resource_specification.md&r1=1668701&r2=1669188&rev=1669188&view=diff
==============================================================================
---
incubator/slider/site/trunk/content/docs/slider_specs/resource_specification.md
(original)
+++
incubator/slider/site/trunk/content/docs/configuration/resource_specification.md
Wed Mar 25 17:59:26 2015
@@ -17,31 +17,159 @@
# Apache Slider Resource Specification
+* [Core Properties](#core)
* [Container Failure Policy](#failurepolicy)
-* [Using Labels](#labels)
-* [Specifying Log Aggregation](#logagg)
+* [Placement Policies and escalation](#placement)
+* [Labels](#labels)
+* [Log Aggregation](#logagg)
+
+
+The Resource specification file, `resources.json`, defines the Yarn resource
needs for each component type that belong to the application.
+This includes:
+* container CPU and memory requirements
+* component placement policy, including YARN labels to explictly request nodes
on.
+* failure policy: what to do if components keep failing.
+* placement escalation policy
+* where logs generated by applications will be saved, information which is
passed to YARN to enable
+these logs to be copied to HDFS and remotely retrieved, even while the
application is running
+As such, it is the core file used by Slider to configure and manage
+the application.
-Resource specification is an input to Slider to specify the Yarn resource
needs for each component type that belong to the application.
+## <a name="core"></a>Core Properties
An example resource requirement for an application that has two components
"master" and "worker" is as follows. Slider will automatically add the
requirements for the AppMaster for the application. This component is named
"slider-appmaster".
Some parameters that can be specified for a component instance include:
-* `yarn.memory`: amount of memory required for the component instance
-* `yarn.vcores`: number of vcores requested
-* `yarn.role.priority`: each component must be assigned unique priority.
Component with higher priority come up earlier than components with lower
priority
-* `yarn.component.instances`: number of instances for this component type
+<table>
+ <tr>
+ <td>yarn.component.instances</td>
+ <td>
+ Number of instances of this component type
+ </td>
+ </tr>
+ <tr>
+ <td>yarn.memory</td>
+ <td>
+ Amount of memory in MB required for the component instance.
+ </td>
+ </tr>
+ <tr>
+ <td>yarn.vcores</td>
+ <td>
+ Number of "virtual cores" requested
+ </td>
+ </tr>
+ <tr>
+ <td>yarn.role.priority</td>
+ <td>
+ Unique priority for this component
+ </td>
+ </tr>
+
+
+</table>
+
+
+### Component instance count: `yarn.component.instances`
+
+The property `yarn.component.instances` is one of the most foundational one in
slider
+âit declares how many instances of a component to instantiate on the cluster.
+
+If the value is set to "0", no instances of a component will be created. If set
+to a larger number, more instances will be requested. Thus the property sets
the size
+of the application, component-by-component.
+
+The number of instances of each component is application-specific; there are
no recommended
+values.
+
+### Container resource requirements: `yarn.memory` and `yarn.vcores`
+
+These two properties define how much memory and CPU capacity each
+YARN container of this component requires. YARN will queue
+container requests until enough capacity exists within the cluster
+to satisfy them. When there is capacity, a container is allocated to Slider,
+which then deploys an instance of the component.
+
+The larger these numbers, the more capacity the application gets.
+
+If more memory or CPU is requested than needed then that containers will take
+longer to be allocated than necessary, and other work may not be scheduled:
+the cluster will be under-utilized.
+
+`yarn.memory` declares the amount of memory to ask for in YARN containers; it
should
+be defined for each component based on the expected memory consumption. It is
measured
+in MB.
+
+If the cluster has hard memory limits enabled, then if the processes in a
container
+use more physical or virtual memory than was granted âYARN will kill the
container.
+Slider will attempt to recreate the component instance by requesting a new
container,
+though if the number of failures of a component is too great then it will
eventually
+give up and fail the application.
+
+A YARN cluster is usually configured with a minimum container allocation, set
in `yarn-site.xml`
+by the configuration parameter `yarn.scheduler.minimum-allocation-mb`; the
default value is
+1024 MB. It will also have a maximum size set in
`yarn.scheduler.maximum-allocation-mb`;
+the default is 8192, that is, 8GB. Asking for more than this will result in
the request
+being rejected.
+
+
+`yarn.vcores` declares the number of "virtual cores" to request. These are a
site-configured
+fraction of a physical CPU core; if the ratio of virtual to physical is 1:1
then a physical core
+is allocated to each one (this may include a Hyperthreaded Core if enabled in
the BIOS).
+If the ratio is lower, such as 2:1, then each vcore allocates half a physical
one.
+
+This notion of a virtual core is intended to partially isolate applications
from differences
+in cluster performance: a process which needs 2 vcores on one cluster should
ideally
+still ask for 2 vcores on a different cluster âeven if the latter has newer
CPU parts.
+In practise, it's not so consistent. Ask for more vcores if your process needs
more CPU
+time.
+
+YARN clusters may be configured to throttle CPU usage: if a process tries to
use more than
+has been granted to the container, it will simply be scheduled with less CPU
time. The penalty
+for using more CPU than requested is therefore less destructive than
attempting to
+use more memory than requested/granted.
+
+
+
+#### Relationship between `yarn.memory` and JVM Heap Size
+
+Java applications deployed by Slider usually have a JVM heap size property
which needs
+to be defined as part of the application configuration.
+
+The value of `yarn.memory` MUST be bigger than the heap size allocated to any
JVM, as a JVM
+uses a lot more memory than simply the heap alone. We have found that asking
for at least 50%
+more appears to work, though some experimentation will be needed.
+
+Slider does not attempt to derive a heap size for any component from the YARN
allocation.
+
+### Component instance count: `yarn.role.priority`
+
+The property `yarn.role.priority` has two purposes within Slider
+1. It provides a unique index of individual component types. That is, it is not
+the name of a component which Slider uses to index components, it is it's
priority
+value.
+1. It defines the priority within an application for YARN to use when
allocating
+components. Components with higher priority get allocated first.
+
+Generally the latter use, YARN allocation priority, is less important for
Slider-deployed
+applications than for analytics applications designed to scale to as many
nodes that can
+be instantiated. A static slider cluster has a predefined number of of each
components to
+request (defined by `yarn.component.instances`), with memory and CPU
requirements of
+each component's container defined by `yarn.memory` and `yarn.vcores`. It will
request
+the specified number of components âand keep those requests outstanding
until they are
+satisfied.
+
+### Example
-Sample:
{
"schema" : "http://example.org/specification/v2.0.0",
"metadata" : {
},
"global" : {
- "yarn.container.failure.threshold":"10",
- "yarn.container.failure.window.hours":"1"
+
},
"components" : {
"HBASE_MASTER" : {
@@ -51,6 +179,8 @@ Sample:
"yarn.vcores" : "1"
},
"slider-appmaster" : {
+ "yarn.memory" : "1024",
+ "yarn.vcores" : "1"
},
"HBASE_REGIONSERVER" : {
"yarn.role.priority" : "2",
@@ -59,6 +189,16 @@ Sample:
}
}
+## <a name="slider-appmaster"></a>The `slider-appmaster` component
+
+The examples here all have a component `slider-appmaster`. This defines the
settings of
+the application master itself: the memory and CPU it requires, optionally a
label (see
+["Labels"](#labels)). The `yarn.role.priority` value is ignored: the priority
is always "0";
+and the instance count, `yarn.component.instances` is implicitly set to "1".
+
+The entry exists primarily to allow applications to configure the amount of
RAM the AM should
+request.
+
## <a name="failurepolicy"></a>Container Failure Policy
YARN containers hosting component instances may fail. This can happen because
of
@@ -89,17 +229,27 @@ The limits are defined in `resources.jso
This duration can span days.
1. The maximum number of failures of any component in this time period.
+### Failure threshold for a component
-The parameters defining the failure policy are as follows.
-* `yarn.container.failure.threshold`
+The number of times a component may fail within a failure window is
+defined by the property `yarn.container.failure.threshold`
-The threshold for failures. If set to "0" there are no limits on
+
+If set to "0" there are no limits on
the number of times containers may fail.
+The failure thresholds for individual components can be set independently
+
-* `yarn.container.failure.window.days`, `yarn.container.failure.window.hours`
-and ``yarn.container.failure.window.minutes`
+### Failure window
+
+The failure window can be set by minutes, days and hours. These must be set
+in the `global` options, as they apply to slider only.
+
+ yarn.container.failure.window.days
+ yarn.container.failure.window.hours
+ yarn.container.failure.window.minutes
These properties define the duration of the window; they are all combined
so the window is, in minutes:
@@ -113,12 +263,14 @@ is exceeded, all failure counts are rese
If the AM itself fails, the failure counts are reset and and the window is
restarted.
-### Per-component and global failure thresholds
-
-The failure thresholds for individual components can be set independently
+The default value is `yarn.container.failure.window.hours=6`; when changing
+the window size, the hour value must be explicitly set, even if to zero, to
+change this.
### Recommended values
+
+
We recommend having a duration of a few hours for the window, and a
large failure limit proportional to the the number of instances of that
component
@@ -130,16 +282,19 @@ trying to reinstantiate all the componen
repeatedly, eventually slider will conclude that there is a problem and fail
with the exit code 73, `EXIT_DEPLOYMENT_FAILED`.
+
### Example
-Here is a `resource.json` file for an HBase cluster
+Here is a `resource.json` file for an HBase cluster:
- "resources": {
+ {
"schema" : "http://example.org/specification/v2.0.0",
"metadata" : { },
"global" : {
"yarn.container.failure.threshold" : "4",
- "yarn.container.failure.window.hours" : "1'
+ "yarn.container.failure.window.days" : "0',
+ "yarn.container.failure.window.hours" : "1',
+ "yarn.container.failure.window.minutes" : "0'
},
"components" : {
"slider-appmaster" : {
@@ -147,13 +302,13 @@ Here is a `resource.json` file for an HB
"yarn.vcores" : "1",
"yarn.component.instances" : "1"
},
- "master" : {
+ "HBASE_MASTER" : {
"yarn.role.priority" : "1",
"yarn.memory" : "256",
"yarn.vcores" : "1",
"yarn.component.instances" : "2"
},
- "worker" : {
+ "HBASE_REGIONSERVER" : {
"yarn.role.priority" : "2",
"yarn.memory" : "512",
"yarn.container.failure.threshold" : "15",
@@ -165,13 +320,18 @@ Here is a `resource.json` file for an HB
The window size is set to one hour: after that the counters are reset.
-There is a global failure threshold of 4. As two instances of the HBase master
-are requested, the failure threshold per hour is double that of the number of
masters.
+There is a global failure threshold of 4 components.
+
There are ten worker components requested; the failure threshold for these
-components is overridden to be fifteen. This allows all workers to fail and
-the cluster to recover âbut only another five failures would be tolerated
-for the remaining hour.
+components is overridden to be fifteen. Given that there are more region
servers
+than masters, a higher failure rate of worker nodes is to be expected **if the
cause of
+the failure is due to the underlying hardware**
+
+Choosing a higher value for the region servers ensures that the application is
resilient
+to harware problems. If there were some configuration problem in the region
server
+deployments, resulting in them all failing rapidly, this threshold would soon
be breached
+which would cause the application to fail. Thus, configuration problems would
be detected.
These failure thresholds are all heuristics. When initially configuring an
application instance, low thresholds reduce the disruption caused by components
@@ -181,8 +341,172 @@ In a production application, large failu
ensures that the application is resilient to transient failures of the
underlying
YARN cluster and hardware.
+
+## <a name="placement"></a>Placement Policies and escalation
+
+Slider can be configured with different options for **placement** âthe
+policies by which it chooses where to ask YARN for nodes.
+
+### Placement Policy
+
+The "placement policy" of a component is the set of rules by which Slider makes
+a decision on where to request instances of that component from YARN.
+
+<table>
+<tr>
+ <td>0</td>
+ <td>
+ Default: placement is spread
+ across the cluster on re-starts, with escalation if requests are
+ unmet. Unreliable nodes are avoided.
+ </td>
+</tr>
+
+<tr>
+ <td>1</td>
+ <td>strict: a component is requested on every node used, irrespective
+ of faiure history. No escalation takes place.</td>
+</tr>
+
+<tr>
+ <td>2</td>
+ <td>Anywhere. Place requests anywhere and ignore the history.</td>
+</tr>
+
+<tr>
+ <td>4</td>
+ <td>Anti affinity required. This option is not currently supported.</td>
+</tr>
+
+</table>
+
+The placement policy is a binary "or" of all the values, and can be
+set in the property `"yarn.component.placement.policy"`.
+
+Example:
+
+ "HBASE_REST": {
+ "yarn.role.priority": "3",
+ "yarn.component.instances": "1",
+ "yarn.component.placement.policy": "1",
+ "yarn.memory": "556"
+ },
+
+This defines an HBASE_REST component with a placement policy of "1"; strict.
+
+On application restarts Slider will re-request the same node.
+
+If the component were configured to request an explicit port for its REST
endpoint,
+then the same URL would reach it whenever this application were deployed
+âprovided the host was available and the port not already in use.
+
+#### Notes
+
+1. There's no support for **anti-affinity** âi.e. to mandate that component
+instances must never be deployed on the same hosts. Once YARN adds support for
+this, Slider will support it.
+
+1. Slider never explicitly black-lists nodes. It does track which nodes have
been
+unreliable "recently", and avoids explicitly requesting them. If YARN does
+actually allocate a container there, Slider will attempt to deploy the
component
+there.
+
+1. Apart from an (optional) label, placement policies for the application
master itself
+ cannot be specified. The Application Master is deployed wherever YARN sees
fit.
+
+
+### Node Failure Threshold, `yarn.node.failure.threshold`
+
+The configuration property `yarn.node.failure.threshold` defines how
"unreliable"
+a node must be before it is skipped for placement requests.
+
+1. This is per-component.
+1. It is ignored for "strict" or "anywhere" placements.
+1. It is reset at the same time as the container failure counters; that is, at
+the interval defined by the `yarn.container.failure.window` properties
+
+### Escalation: `yarn.placement.escalate.seconds`
+
+For any component whose placement policy is not "any", Slider saves to HDFS
+a record the nodes on which instances were running. When starting a cluster,
it uses
+this history to identify hosts on which to request instances.
+
+1. Slider initially asks for nodes on those specific hosts âprovided their
recent failure
+history is considered acceptable.
+1. It tracks which 'placed' requests are outstanding.
+1. If, after the specified escalation time, YARN containers have not been
allocated
+on those nodes, slider will "escalate" the placement of those requests that are
+outstanding.
+1. It currently does this by cancelling each request and re-requesting a
container on
+that node, this time with the `relaxLocality` flag set.
+1. This tells YARN to seek an alternative location in the cluster if it cannot
+allocate one on the target host.
+1. If there is enough capacity in the cluster, the new node will then be
allocated.
+
+
+The higher the cost of migrating a component instance from one host to
another, the longer
+we would recommend for an escalation timeout.
+
+Example:
+
+ {
+ "schema": "http://example.org/specification/v2.0.0",
+ "metadata": {
+ },
+ "global": {
+ },
+ "components": {
+ "HBASE_MASTER": {
+ "yarn.role.priority": "1",
+ "yarn.component.instances": "1",
+ "yarn.placement.escalate.seconds": "10"
+ },
+ "HBASE_REGIONSERVER": {
+ "yarn.role.priority": "2",
+ "yarn.component.instances": "10",
+ "yarn.placement.escalate.seconds": "600"
+ },
+ "slider-appmaster": {
+ }
+ }
+ }
+
+This declares that the `HBASE_MASTER` placement should be escalated after one
second,
+but that that `HBASE_REGIONSERVER` instances should have an escalation timeout
of 600
+seconds âten minutes. These values were chosen as an HBase Master can be
allocated
+anywhere in the cluster, but a region server is significantly faster if
restarted
+on the same node on which it previously saved all its data. Even though HDFS
will
+have replicated all data elsewhere, it will have been scattered across the
cluster
+âresulting in remote access for most of the data, at least until a full
compaction
+has taken place.
+
+
+#### Notes
+
+1. Escalation goes directory from "specific node" to "anywhere in cluster"; it
does
+not have any intermediate "same-rack" policy.
+
+1. If components were assigned to specific labels, then even when placement is
+"escalated", Slider will always ask for containers on the specified labels.
That
+is âit will never relax the constraint of "deploy on the labels specified".
If
+there are not enough labelled nodes for the application, either the cluster
+administrators need to add more labelled nodes, or the application must be
reconfigured
+with a different label policy.
+
+1. Escalated components may be allocated containers on nodes which already
have a running
+instance of the same component.
+
+1. If the placement policy is "strict", there is no escalation. If the node
+is not available or lacks capacity, the request will remain unsatisfied.
+
+1. There is no placement escalation option for the application master.
+
+1. For more details, see: [Role History](/design/rolehistory.html)
+
+
## <a name="labels"></a>Using Labels
-The resources.json file can be used to specify the labels to be used when
allocating containers for the components. The details of the YARN Label feature
can be found at [YARN-796](https://issues.apache.org/jira/browse/YARN-796).
+
+The `resources.json` file include specifications the labels to be used when
allocating containers for the components. The details of the YARN Label feature
can be found at [YARN-796](https://issues.apache.org/jira/browse/YARN-796).
In summary:
@@ -193,55 +517,77 @@ In summary:
This way, you can guarantee that a certain set of nodes are reserved for an
application or for a component within an application.
-Label expression is specified through property "yarn.label.expression". When
no label expression is specified then it is assumed that only non-labeled nodes
are used when allocating containers for component instances.
+Label expression is specified through property `yarn.label.expression`. When
no label expression is specified then it is assumed that only non-labeled nodes
are used when allocating containers for component instances.
+
+If a label expression is specified for the `slider-appmaster` component then
it also becomes the default label expression for all component.
-If label expression is specified for slider-appmaster then it also becomes the
default label expression for all component. To take advantage of default label
expression leave out the property (see HBASE_REGIONSERVER in the example).
Label expression with empty string ("yarn.label.expression":"") means nodes
without labels.
+#### Example
-Example
-Here is a resource.json file for an HBase cluster which uses labels. The label
for the application instance is "hbase1" and the label expression for the
HBASE_MASTER components is "hbase1_master". HBASE_REGIONSERVER instances will
automatically use label "hbase1". Alternatively, if you specify
("yarn.label.expression":"") for HBASE_REGIONSERVER then the containers will
only be allocated on nodes with no labels.
+Here is a `resource.json` file for an HBase cluster which uses labels.
+
+The label for the application master is `hbase1` and the label expression for
the HBASE_MASTER components is `hbase1_master`.
+`HBASE_REGIONSERVER` instances will automatically use label `hbase1`.
{
- "schema": "http://example.org/specification/v2.0.0",
- "metadata": {
+ "schema": "http://example.org/specification/v2.0.0",
+ "metadata": {
+ },
+ "global": {
+ },
+ "components": {
+ "HBASE_MASTER": {
+ "yarn.role.priority": "1",
+ "yarn.component.instances": "1",
+ "yarn.label.expression":"hbase1_master"
},
- "global": {
+ "HBASE_REGIONSERVER": {
+ "yarn.role.priority": "2",
+ "yarn.component.instances": "10",
},
- "components": {
- "HBASE_MASTER": {
- "yarn.role.priority": "1",
- "yarn.component.instances": "1",
- "yarn.label.expression":"hbase1_master"
- },
- "HBASE_REGIONSERVER": {
- "yarn.role.priority": "1",
- "yarn.component.instances": "1",
- },
- "slider-appmaster": {
- "yarn.label.expression":"hbase1"
- }
+ "slider-appmaster": {
+ "yarn.label.expression":"hbase1"
}
+ }
}
-Specifically, for the above example you will need:
+To deploy this application in a YARN cluster, the following steps must be
followed.
+
+1. Create two labels, `hbase1` and `hbase1_master` (use `yarn rmadmin`
commands)
+1. Assign the labels to nodes (use `yarn rmadmin` commands)
+1. Perform refresh queue (`yarn -refreshqueue`)
+1. Create a queue by defining it in the capacity scheduler configuragion.
+1. Allow the queue to access to the labels and ensure that appropriate min/max
capacity is assigned
+1. Perform refresh queue (`yarn -refreshqueue`)
+1. Create the Slider application against the above queue using parameter
--queue while creating the application
+
+### Notes
+
+1. If a label is defined in the `global` section, it will also apply to all
components which do
+not explicitly identify a label. If such a label is expression is set there
and another is defined
+for the `slider-appmaster`, the app master's label is only used for its
placement.
+
+1. To explicitly request that components are not requested on a label,
irrespective of
+any global- or appmaster- spettings, set the `yarn.label.expression` to an
empty string:
+
+ "HBASE_REGIONSERVER": {
+ "yarn.role.priority": "2",
+ "yarn.component.instances": "10",
+ "yarn.label.expression":""
+ }
-* Create two labels, hbase1 and hbase1_master (use yarn rmadmin commands)
-* Assign the labels to nodes (use yarn rmadmin commands)
-* Perform refresh queue (yarn -refreshqueue)
-* Create a queue by defining it in the capacity scheduler config
-* Allow the queue to access to the labels and ensure that appropriate min/max
capacity is assigned
-* Perform refresh queue (yarn -refreshqueue)
-* Create the Slider application against the above queue using parameter
--queue while creating the application
+1. If there is not enough capacity within a set of labelled nodes for the
requested containers,
+the application instance will not reach its requested size.
+## <a name="logagg"></a>Log Aggregation
-## <a name="logagg"></a>Using Log Aggregation
Log aggregation at regular intervals for long running services (LRS) needs to
be enabled at the YARN level before
any application can exploit this functionality. To enable set the following
property to a positive value of 3600 (in secs)
or more. If set to a positive value less than 3600 (1 hour) this property
defaults to 3600. To disable log aggregation
set it to -1.
<property>
-
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
- <value>3600</value>
+
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
+ <value>3600</value>
</property>
Subsequently every application owner has the flexibility to set the include
and exclude patterns of file names that
@@ -250,27 +596,105 @@ of files that need to be backed up under
set at the global level as shown below -
{
- "schema": "http://example.org/specification/v2.0.0",
- "metadata": {
+ "schema": "http://example.org/specification/v2.0.0",
+ "metadata": {
+ },
+ "global": {
+ "yarn.log.include.patterns": "hbase*.*",
+ "yarn.log.exclude.patterns": "hbase*.out"
+ },
+ "components": {
+ "HBASE_MASTER": {
+ "yarn.role.priority": "1",
+ "yarn.component.instances": "1",
+ },
+ "HBASE_REGIONSERVER": {
+ "yarn.role.priority": "2",
+ "yarn.component.instances": "10",
},
- "global": {
- "yarn.log.include.patterns": "hbase*.*",
- "yarn.log.exclude.patterns": "hbase*.out"
- },
- "components": {
- "HBASE_MASTER": {
- "yarn.role.priority": "1",
- "yarn.component.instances": "1",
- },
- "HBASE_REGIONSERVER": {
- "yarn.role.priority": "1",
- "yarn.component.instances": "1",
- },
- "slider-appmaster": {
- }
+ "slider-appmaster": {
}
+ }
}
The details of the YARN Log Aggregation feature can be found at
[YARN-2468](https://issues.apache.org/jira/browse/YARN-2468).
+
+## Putting it all together
+
+Here is an example of a definition of an HBase cluster.
+
+
+
+ {
+ "schema": "http://example.org/specification/v2.0.0",
+ "metadata": {
+ },
+ "global": {
+ "yarn.log.include.patterns": "hbase*.*",
+ "yarn.log.exclude.patterns": "hbase*.out",
+ "yarn.container.failure.window.hours": "0",
+ "yarn.container.failure.window.minutes": "30",
+ "yarn.label.expression":"development"
+ },
+ "components": {
+ "slider-appmaster": {
+ "yarn.memory": "1024",
+ "yarn.vcores": "1"
+ "yarn.label.expression":""
+ },
+ "HBASE_MASTER": {
+ "yarn.role.priority": "1",
+ "yarn.component.instances": "1",
+ "yarn.placement.escalate.seconds": "10",
+ "yarn.vcores": "1",
+ "yarn.memory": "1500"
+ },
+ "HBASE_REGIONSERVER": {
+ "yarn.role.priority": "2",
+ "yarn.component.instances": "1",
+ "yarn.vcores": "1",
+ "yarn.memory": "1500",
+ "yarn.container.failure.threshold": "15",
+ "yarn.placement.escalate.seconds": "60"
+ },
+ "HBASE_REST": {
+ "yarn.role.priority": "3",
+ "yarn.component.instances": "1",
+ "yarn.component.placement.policy": "1",
+ "yarn.container.failure.threshold": "3",
+ "yarn.vcores": "1",
+ "yarn.memory": "556"
+ },
+ "HBASE_THRIFT": {
+ "yarn.role.priority": "4",
+ "yarn.component.instances": "0",
+ "yarn.component.placement.policy": "1",
+ "yarn.vcores": "1",
+ "yarn.memory": "556"
+ "yarn.label.expression":"stable"
+ },
+ "HBASE_THRIFT2": {
+ "yarn.role.priority": "5",
+ "yarn.component.instances": "1",
+ "yarn.component.placement.policy": "1",
+ "yarn.vcores": "1",
+ "yarn.memory": "556"
+ "yarn.label.expression":"stable"
+ }
+ }
+ }
+
+There are ten region servers, with a 60-second timeout for placement
escalation;
+15 containers can fail in the "recent" time window before the application is
+considered to have failed.
+
+The time window to reset failures is set to 30 minutes.
+
+The Thrift, Thrift2 and REST servers all have strict placement. The REST
+server also has a container failure threshold of 3: if it can not come up
+three times, the entire application deployment is considered a failure.
+
+The default label for nodes is "development". For the application master
itself it is "",
+meaning anywhere. Both thrift services are requested on the labels "stable"
\ No newline at end of file
Copied:
incubator/slider/site/trunk/content/docs/configuration/revision-1/index.md
(from r1660946, incubator/slider/site/trunk/content/docs/configuration/index.md)
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/revision-1/index.md?p2=incubator/slider/site/trunk/content/docs/configuration/revision-1/index.md&p1=incubator/slider/site/trunk/content/docs/configuration/index.md&r1=1660946&r2=1669188&rev=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/docs/configuration/index.md (original)
+++ incubator/slider/site/trunk/content/docs/configuration/revision-1/index.md
Wed Mar 25 17:59:26 2015
@@ -15,20 +15,10 @@
limitations under the License.
-->
-# Apache Slider: Specification of an application instance, revision 2.0
+# Apache Slider: Specification of an application instance, revision 1
-The specification of an application comprises
-
-1. The persistent description of an application's configuration
-1. The persistent description of the desired topology and YARN resource
-requirements.
-1. The dynamic description of the running application, including information
-on the location of components and aggregated statistics.
-
-The specifics of this are covered in the [Core Configuration
Specification](core.html)
-
-
-## Historical References
+This is the original specification of an application instance, including
discussion
+on a proposed rework.
1. [Specification](specification.html)
1. [Redesign](redesign.html)
@@ -37,3 +27,4 @@ The specifics of this are covered in the
1. [Example: current](original-hbase.json)
1. [Example: proposed](proposed-hbase.json)
+This design has been supplanted by the [version 2.0](../index.html) design.
\ No newline at end of file
Modified: incubator/slider/site/trunk/content/docs/examples.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/examples.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/docs/examples.md (original)
+++ incubator/slider/site/trunk/content/docs/examples.md Wed Mar 25 17:59:26
2015
@@ -102,7 +102,7 @@ or
### Optional: point bin/slider at your chosen cluster configuration
-export
SLIDER_CONF_DIR=~/Projects/slider/slider-core/src/test/configs/ubuntu-secure/slider
+ export
SLIDER_CONF_DIR=~/Projects/slider/slider-core/src/test/configs/ubuntu-secure/slider
## Optional: Clean up any existing slider cluster details
Modified: incubator/slider/site/trunk/content/docs/getting_started.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/getting_started.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/docs/getting_started.md (original)
+++ incubator/slider/site/trunk/content/docs/getting_started.md Wed Mar 25
17:59:26 2015
@@ -240,7 +240,7 @@ As Slider creates each instance of a com
All this information goes into the **Resources Specification** file ("Resource
Spec") named `resources.json`. The Resource Spec tells Slider how many
instances of each component in the application (such as an HBase RegionServer)
to deploy and the parameters for YARN.
-An application package should contain the default resources.json and you can
start from there. Or you can create one based on [Resource
Specification](slider_specs/resource_specification.html).
+An application package should contain the default resources.json and you can
start from there. Or you can create one based on [Resource
Specification](/configuration/resource.html)).
Store the Resource Spec file on your local disk (e.g. `/tmp/resources.json`).
Modified: incubator/slider/site/trunk/content/docs/high_availability.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/high_availability.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/docs/high_availability.md (original)
+++ incubator/slider/site/trunk/content/docs/high_availability.md Wed Mar 25
17:59:26 2015
@@ -116,7 +116,6 @@ for setup details.
<property>
<description>The class to use as the persistent
store.</description>
<name>yarn.resourcemanager.store.class</name>
-
<!ÂÂ--value>org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore</valueÂ-->
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
@@ -126,7 +125,7 @@ for setup details.
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
as the value for yarn.resourcemanager.store.class
</description>
- <name>yarn.resourcemanager.zkÂ-address</name>
+ <name>yarn.resourcemanager.zk.address</name>
<value>127.0.0.1:2181</value>
</property>
Modified: incubator/slider/site/trunk/content/docs/security.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/security.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/docs/security.md (original)
+++ incubator/slider/site/trunk/content/docs/security.md Wed Mar 25 17:59:26
2015
@@ -118,7 +118,7 @@ The Application Master will read in the
relevant number of componentss.
### The Keytab distribution/access Options
- Rather than relying on delegation token based authentication mechanisms, the
AM leverages keytab files for obtaining the principals to authenticate to the
configured cluster KDC. In order to perform this login the AM requires access
to a keytab file that contains the principal representing the user identity to
be associated with the launched application instance (e.g. in an HBase
installation you may elect to leverage the âhbaseâ principal for this
purpose). There are two mechanisms supported for keytab access and/or
distribution:
+ Rather than relying on delegation token based authentication mechanisms, the
AM leverages keytab files for obtaining the principals to authenticate to the
configured cluster KDC. In order to perform this login the AM requires access
to a keytab file that contains the principal representing the user identity to
be associated with the launched application instance (e.g. in an HBase
installation you may elect to leverage the `hbase` principal for this purpose).
There are two mechanisms supported for keytab access and/or distribution:
#### Local Keytab file access:
@@ -129,11 +129,11 @@ relevant number of componentss.
"slider-appmaster": {
"jvm.heapsize": "256M",
"slider.am.keytab.local.path":
"/etc/security/keytabs/hbase.headless.keytab",
- âslider.keytab.principal.nameâ : âhbase"
+ "slider.keytab.principal.name" : "hbase"
}
}
- The âslider.am.keytab.local.pathâ property provides the full path to the
keytab file location and is mandatory for the local lookup mechanism. The
principal to leverage from the file is identified by the
âslider.keytab.principal.nameâ property.
+ The `slider.am.keytab.local.path` property provides the full path to the
keytab file location and is mandatory for the local lookup mechanism. The
principal to leverage from the file is identified by the
`slider.keytab.principal.name` property.
In this scenario the distribution of keytab files for the AM AND the
application itself is the purview of the application deployer. So, for
example, for an hbase deployment, the hbase site service keytab will have to be
distributed as well and indicated in the hbase-site properties:
@@ -152,32 +152,47 @@ relevant number of componentss.
"jvm.heapsize": "256M",
"slider.hdfs.keytab.dir": ".slider/keytabs/hbase",
"slider.am.login.keytab.name": "hbase.headless.keytab",
- âslider.keytab.principal.nameâ : âhbase"
+ "slider.keytab.principal.name" : "hbase"
}
}
- The âslider.hdfs.keytab.dirâ points to an HDFS path, relative to the
userâs home directory (e.g. /users/hbase), in which slider can find all
keytab files required for both AM login as well as application services (e.g.
for hbase that would be the headless keytab for the AM and the service keytab
for the HBase application components). If no value is specified, a default
location of â.slider/keytabs/<cluster name>â is assumed.
- The âslider.am.login.keytab.nameâ is the name of the keytab file
(mandatory property), found within the specified directory, that the AM will
use to lookup up the login principal and authenticate.
-
- If leveraging the slider-based distribution mechanism, the keytab files for
components will be accessible from a âkeytabsâ sub-directory of the
container work folder and can therefore be specified relative to the
$AGENT_WORK_ROOT/keytabs directory, e.g.:
+The `slider.hdfs.keytab.dir` points to an HDFS path, relative to the user`s
home directory
+(e.g. `/users/hbase`), in which slider can find all keytab files required for
both AM login
+as well as application services. For example, for Apache HBase the uses would
be the headless keytab
+for the AM and the service keytab for the HBase application components).
+
+If no value is specified, a default location of `.slider/keytabs/<cluster
name>` is assumed.
+
+The `slider.am.login.keytab.name` is the name of the keytab file (mandatory
property),
+found within the specified directory, that the AM will use to lookup up the
login principal and authenticate.
+
+When using the slider-based distribution mechanism, the keytab files for
components will be
+accessible from a `keytabs` sub-directory of the container work folder and can
therefore be
+specified relative to the `$AGENT_WORK_ROOT/keytabs` directory, e.g.:
. . .
"site.hbase-site.hbase.master.kerberos.principal":
"hbase/[email protected]",
"site.hbase-site.hbase.master.keytab.file":
"${AGENT_WORK_ROOT}/keytabs/hbase.service.keytab",
. . .
- For both mechanisms above, the principal name used for authentication is
either:
+For both mechanisms above, the principal name used for authentication is
either:
-* The principal name established on the client side before invocation of the
Slider CLI (the principal used to âkinitâ) or
-* The value specified for a âslider.keytab.principal.nameâ property.
+* The principal name established on the client side before invocation of the
Slider CLI (the principal used to `kinit`) or
+* The value specified for a `slider.keytab.principal.name` property.
#### Slider Client Keytab installation:
-The Slider client can be leveraged to install keytab files individually into a
designated keytab HDFS folder. The format of the command is:
+The Slider client can be leveraged to install keytab files individually into a
designated
+keytab HDFS folder. The format of the command is:
slider install-keytab âkeytab <path to keytab on local file system>
âfolder <name of HDFS folder to store keytab> [âoverwrite]
-The command will store the keytab file specified by the ââkeytabâ option
in to an HDFS folder that is created or exists under
/user/username/.slider/keytabs named by the ââfolderâ option (e.g. if the
folder name specified is âHBASEâ the keytab will be stored in
/user/username/.slider/keytabs/HBASE). The command can be used to upload
keytab files individually up to HDFS. For example, if uploading both AM and
HBase service keytabs to the âHBASEâ folder, the command will be invoked
twice:
+The command will store the keytab file specified by the `âkeytab` option in
to
+an HDFS folder that is created or exists under
`/user/username/.slider/keytabs` named by the
+`âfolder` option (e.g. if the folder name specified is `HBASE` the keytab
will be stored in
+ `/user/username/.slider/keytabs/HBASE`).
+The command can be used to upload keytab files individually up to HDFS.
+ For example, if uploading both AM and HBase service keytabs to the `HBASE`
folder, the command will be invoked twice:
slider install-keytab âkeytab
/my/local/keytabs/folder/hbase.headless.keytab âfolder HBASE
slider install-keytab âkeytab
/my/local/keytabs/folder/hbase.service.keytab âfolder HBASE
@@ -195,10 +210,10 @@ Subsequently, the associated hbase-site
"jvm.heapsize": "256M",
"slider.hdfs.keytab.dir": ".slider/keytabs/HBASE",
"slider.am.login.keytab.name": "hbase.headless.keytab"
- âslider.keytab.principal.nameâ : âhbase"
+ `slider.keytab.principal.name` : `hbase"
}
}
-
+
## Securing communications between the Slider Client and the Slider AM.
When the AM is deployed in a secure cluster,
@@ -255,16 +270,25 @@ They can also be set on the Slider comma
-S java.security.krb5.realm=MINICLUSTER -S
java.security.krb5.kdc=hadoop-kdc
## Generation and deployment of application keystores/truststores
-Application components may make use of keystores and truststores to establish
secure communications. Given the nature of application deployments in a YARN
cluster and the lack of certainty concerning the nodemanager host on which a
component container may be spawned, Slider provides the facility for creating
and deploying the keystores and truststores that may be required.
+
+Application components may make use of keystores and truststores to establish
secure communications.
+Given the nature of application deployments in a YARN cluster and the lack of
certainty concerning
+the host on which a component container may be allocated,
+Slider provides the facility for creating and deploying the keystores and
truststores that may be required.
The process of enabling application keystore/truststore generation and
deployment is:
-* Set the "slider.component.security.stores.required" property to "true".
This property can be set as a global property (indicating all components
require stores) or can be set/overridden at the component level to selectively
enable store generation for a given component.
+* Set the configuration option `"slider.component.security.stores.required"`
to `"true"`.
+ This optional can be set as a global property (indicating all components
require stores) or can be set/overridden at the component level to selectively
enable store generation for a given component.
* Specify the password property for the component keystore or truststore or,
* Specify the property providing the alias that references a credential
managed by the Hadoop Credential Provider. This credential provides the
password for securing the keystore/truststore.
### Specifying a keystore/truststore password
-Applications that make use of a keystore and/or truststore may already have
configuration properties that reference the value for the password used to
secure the given certificate store. In those instances the application
configuration can reference the value of the password property in the component
specific configuration section:
+
+Applications that make use of a keystore and/or truststore may already have
configuration
+properties that reference the value for the password used to secure the given
certificate store.
+In those instances the application configuration can reference the value of
the password property
+in the component specific configuration section:
"APP_COMPONENT": {
"slider.component.security.stores.required": "true",
@@ -273,13 +297,14 @@ Applications that make use of a keystore
In this example:
-* The store required property is set to "true" for the APP_COMPONENT component
-* The application has a property in its site configuration file named
"app_component.keystore.password". This property is specified in the appConfig
file's global section (with the "site.myapp-site" prefix), and is referenced
here to indicate to Slider which application property provides the store
password.
+* The store required property is set to `"true"` for the `APP_COMPONENT`
component
+* The application has a property in its site configuration file named
`"app_component.keystore.password"`.
+This property is specified in the appConfig file's global section (with the
"site.myapp-site" prefix), and is referenced here to indicate to Slider which
application property provides the store password.
### Specifying a keystore/truststore Credential Provider alias
Applications that utilize the Credenfial Provider API to retrieve application
passwords can specify the following configuration:
-* Indicate the credential storage path in the "credentials" section of the app
configuration file:
+* Indicate the credential storage path in the `credentials` section of the app
configuration file:
"credentials": {
"jceks://hdfs/user/${USER}/myapp.jceks":
["app_component.keystore.password.alias"]
@@ -302,7 +327,9 @@ At runtime, Slider will read the credent
When trying to talk to a secure, cluster you may see the message:
No valid credentials provided (Mechanism level: Illegal key size)]
-
+or
+ No valid credentials provided (Mechanism level: Failed to find any
Kerberos tgt)
+
This means that the JRE does not have the extended cryptography package
needed to work with the keys that Kerberos needs. This must be downloaded
from Oracle (or other supplier of the JVM) and installed according to
Modified:
incubator/slider/site/trunk/content/docs/slider_specs/application_instance_configuration.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/slider_specs/application_instance_configuration.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
---
incubator/slider/site/trunk/content/docs/slider_specs/application_instance_configuration.md
(original)
+++
incubator/slider/site/trunk/content/docs/slider_specs/application_instance_configuration.md
Wed Mar 25 17:59:26 2015
@@ -84,7 +84,7 @@ An appConfig.json contains the applicati
appConf.json allows you to pass in arbitrary set of configuration that Slider
will forward to the application component instances.
## Variable naming convention
-In order to understand how the naming convention work, lets look at how the
config is passed on to component commands. Slider agent recevies a structured
bag of commands as input for all commands, INSTALL, CONFIGURE, START, etc. The
command includes a section "configuration" which has config properties arranged
into named property bags.
+In order to understand how the naming convention works, lets look at how the
config is passed on to component commands. Slider agent recevies a structured
bag of commands as input for all commands, INSTALL, CONFIGURE, START, etc. The
command includes a section "configuration" which has config properties arranged
into named property bags.
* Variables of the form `site.xx.yy` translates to variables by the name `yy`
within the group `xx` and are typically converted to site config files by the
name `xx` containing variable `yy`. For example,
`"site.hbase-site.hbase.regionserver.port":""` will be sent to the Slider-Agent
as `"hbase-site" : { "hbase.regionserver.port": ""}` and app definition scripts
can access all variables under `hbase-site` as a single property bag.
* Similarly, `site.core-site.fs.defaultFS` allows you to pass in the default
fs. *This specific variable is automatically made available by Slider but its
shown here as an example.*
@@ -114,7 +114,7 @@ For example, HBase master info port need
There is no set guideline for doing so. How an application emits metrics and
how the metrics are emitted to the right place is completely defined by the
application. In the following example, we hso how HBase app is configured to
emit metrics to a ganglia server.
-Ganglia server lifecycle is not controlled by the app instance. So the app
instance only needs to know where to emit the metrics. This is achieved by
three global variables
+Ganglia server lifecycle is not controlled by the app instance. So the app
instance only needs to know where to emit the metrics. This is achieved through
global variables
* "site.global.ganglia_enabled":"true"
* "site.global.ganglia_server_host": "gangliaserver.my.org"
Modified:
incubator/slider/site/trunk/content/docs/slider_specs/creating_app_definitions.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/slider_specs/creating_app_definitions.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
---
incubator/slider/site/trunk/content/docs/slider_specs/creating_app_definitions.md
(original)
+++
incubator/slider/site/trunk/content/docs/slider_specs/creating_app_definitions.md
Wed Mar 25 17:59:26 2015
@@ -26,7 +26,7 @@ An application instance consists of seve
Figure 1 - High-level view of a container
For example:
-
+
yarn 8849 -- python ./infra/agent/slider-agent/agent/main.py --label
container_1397675825552_0011_01_000003___HBASE_REGIONSERVER --host AM_HOST
--port 47830
yarn 9085 -- bash
/hadoop/yarn/local/usercache/yarn/appcache/application_1397675825552_0011/ ...
internal_start regionserver
yarn 9114 -- /usr/jdk64/jdk1.7.0_45/bin/java -Dproc_regionserver
-XX:OnOutOfMemoryError=...
@@ -37,7 +37,7 @@ The above list shows three processes, th
The following command creates an HBase application using the AppPackage for
HBase.
./slider create cl1 --template /work/appConf.json --resources
/work/resources.json
-
+
Lets analyze various parameters from the perspective of app creation:
* `--template`: app configuration
@@ -110,12 +110,12 @@ Looking at the content through unzip -l
Sample **resources-default.json** and **appConfig-default.json** files are
also included in the enlistment. These are samples and are typically tested on
one node test installations. These files are not used during the create
commmand, rather the files provided as input parameter are the ones that are
used. *So you can leave these files as is in the package.*
-### --template appConfig.json
+### `--template appConfig.json`
An appConfig.json contains the application configuration. See [Specifications
InstanceConfiguration](application_instance_configuration.html) for details on
how to create a template config file. The enlistment includes sample config
files for HBase, Accumulo, and Storm.
-### --resources resources.json
-Resource specification is an input to Slider to specify the Yarn resource
needs for each component type that belong to the application. [Specification of
Resources](resource_specification.html) describes how to write a resource
config json file. The enlistment includes sample config files for HBase,
Accumulo, and Storm.
+### `--resources resources.json`
+Resource specification is an input to Slider to specify the Yarn resource
needs for each component type that belong to the application. [Specification of
Resources](/docs/configuration/resource.html) describes how to write a resource
config json file. The enlistment includes sample config files for HBase,
Accumulo, and Storm.
## Scripting for AppPackage
Modified:
incubator/slider/site/trunk/content/docs/slider_specs/hello_world_slider_app.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/slider_specs/hello_world_slider_app.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
---
incubator/slider/site/trunk/content/docs/slider_specs/hello_world_slider_app.md
(original)
+++
incubator/slider/site/trunk/content/docs/slider_specs/hello_world_slider_app.md
Wed Mar 25 17:59:26 2015
@@ -106,7 +106,8 @@ Most applications release a tarball that
## Step 3: Create a default resources file (resources.json)
-By default all resources.json file must include slider-appmaster. Add one more
entry for the component MEMCACHED and assign a unique priority and default
number of instances. Ensure, that a suitable default value is provided for
yarn.memory. Additional details are available
[here](/docs/slider_specs/resource_specification.html).
+By default all `resources.json` files must include a `slider-appmaster`
component
+Add one more entry for the component `MEMCACHED` and assign a unique priority
and default number of instances. Ensure, that a suitable default value is
provided for yarn.memory. Additional details are available
[here](/docs/configuration/resource.html)).
{
Modified: incubator/slider/site/trunk/content/docs/slider_specs/index.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/slider_specs/index.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/docs/slider_specs/index.md (original)
+++ incubator/slider/site/trunk/content/docs/slider_specs/index.md Wed Mar 25
17:59:26 2015
@@ -40,18 +40,20 @@
Refer to [Creating a Slider package for
Memcached](hello_world_slider_app.html) for a quick over view of how to write a
Slider app.
-Packaging enhancements: [Simplified Packaging](simple_pkg.html) describes a
simplified version of packaging that Slider supports for applications that do
not need full capability of a Slider application package. *The work is
available in the develop branch and is targeted for the next relase.*
+Packaging enhancements: [Simplified Packaging](simple_pkg.html) describes a
simplified version of packaging that Slider
+supports for applications that do not need full capability of a Slider
application package.
+*The work is available in the develop branch and is targeted for the next
relase.*
-The entry points to leverage Slider are:
+The entry points to use Slider are:
- [Application Needs](application_needs.html) What it takes to be deployable
by Slider.
- [Slider AppPackage](creating_app_definitions.html) Overview of how to create
an Slider AppPackage.
- [Specifications for AppPackage](application_package.html) Describes the
structure of an AppPackage
-- [Specifications for Application Definition](application_definition.html) How
to write metainfo.xml?
-- [Specification of Resources](resource_specification.html) How to write a
resource spec for an app?
-- [Specifications
InstanceConfiguration](application_instance_configuration.html) How to write a
template config for an app?
+- [Specifications for Application Definition](application_definition.html) How
to write metainfo.xml
+- [Specification of Resources](/docs/configuration/resource.html) How to write
a resource spec for an app/
+- [Specifications
InstanceConfiguration](application_instance_configuration.html) How to write a
template config for an app
- [Specifications for Configuration](application_configuration.html) Default
application configuration
- [Specifying Exports](specifying_exports.html) How to specify exports for an
application?
- [Documentation for "General Developer Guidelines"](/developing/index.html)
-* [Configuring the Slider Chaos Monkey](chaosmonkey.html)
-
+
+
Modified:
incubator/slider/site/trunk/content/docs/slider_specs/specifying_exports.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/slider_specs/specifying_exports.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/docs/slider_specs/specifying_exports.md
(original)
+++ incubator/slider/site/trunk/content/docs/slider_specs/specifying_exports.md
Wed Mar 25 17:59:26 2015
@@ -30,7 +30,7 @@ All exports are specified in the metadat
Slider application packages accept an appConfig.json file for all application
configuration supplied by the user. Any property whose name starts with "site"
are considered configuration. [Specifications
InstanceConfiguration](application_instance_configuration.html) describes the
naming convention.
### Export specific configs
-By default all configurations are exported (e.g.
http://hos1:44500/ws/v1/slider/publisher/slider/storm-site). They can be
disabled by specifying `<exportedConfigs>None</exportedConfigs>` under
`<application>`. If you want to explicitly specify what to publish you can use
comma separated named such as
`<exportedConfigs>storm-site,another-site</exportedConfigs>`.
+By default all configurations are exported (e.g.
`http://host1:44500/ws/v1/slider/publisher/slider/storm-site`). They can be
disabled by specifying `<exportedConfigs>None</exportedConfigs>` under
`<application>`. If you want to explicitly specify what to publish you can use
comma separated named such as
`<exportedConfigs>storm-site,another-site</exportedConfigs>`.
### Which component is responsible for export
By default an arbitrary master is chosen as the master responsible for
exporting the config. *What this means is that when this master is STARTED the
applied config known at that time is exported*. Otherwise, you can specify
which master component type should export configuration by specifying
`<publishConfig>true</publishConfig>` under `<component>`.
Modified: incubator/slider/site/trunk/content/docs/ssl.md
URL:
http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/ssl.md?rev=1669188&r1=1669187&r2=1669188&view=diff
==============================================================================
--- incubator/slider/site/trunk/content/docs/ssl.md (original)
+++ incubator/slider/site/trunk/content/docs/ssl.md Wed Mar 25 17:59:26 2015
@@ -1,3 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
#Set Up Two-Way SSL Between the Slider Agents and the Application Master
Two-way SSL provides a higher level of secure communication between the Slider
Application Master and Agents by requiring both to verify each other's identify
prior to the exchange of HTTP requests and responses. By default the
communication mechanism between the two is One-Way SSL. To enable Two-way SSL:
@@ -11,4 +29,4 @@ Two-way SSL provides a higher level of s
}
}
-* Create and start the cluster (e.g. by using the slider command line
leveraging the "create" option)
+* Create and start the cluster (e.g. by using the slider command line and the
"create" option)