Author: buildbot
Date: Thu Mar 19 16:48:45 2015
New Revision: 944362
Log:
Staging update by buildbot for slider
Modified:
websites/staging/slider/trunk/content/ (props changed)
websites/staging/slider/trunk/content/design/rolehistory.html
Propchange: websites/staging/slider/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Mar 19 16:48:45 2015
@@ -1 +1 @@
-1667103
+1667813
Modified: websites/staging/slider/trunk/content/design/rolehistory.html
==============================================================================
--- websites/staging/slider/trunk/content/design/rolehistory.html (original)
+++ websites/staging/slider/trunk/content/design/rolehistory.html Thu Mar 19
16:48:45 2015
@@ -177,35 +177,64 @@ Latest release: <strong>0.60.0-incubatin
limitations under the License.
-->
-<h1
id="apache-slider-role-history-how-slider-brings-back-nodes-in-the-same-location">Apache
Slider Role History: how Slider brings back nodes in the same location</h1>
-<h3 id="last-updated-2013-12-06">Last updated 2013-12-06</h3>
-<ul>
-<li>This document uses the pre-slider terminology of role/cluster and not
-component and application instance *</li>
-</ul>
-<h2 id="outstanding-issues">Outstanding issues</h2>
-<ol>
-<li>
-<p>Can we use the history to implement anti-affinity: for any role with this
flag,
-use our knowledge of the cluster to ask for all nodes that aren't in use
already</p>
-</li>
-<li>
-<p>How to add blacklisting here? We are tracking failures and startup failures
-per node (not persisted), but not using this in role placement requests
yet.</p>
-</li>
+<h1
id="apache-slider-placement-how-slider-brings-back-nodes-in-the-same-location">Apache
Slider Placement: how Slider brings back nodes in the same location</h1>
+<h3 id="last-updated-2015-03-19">Last updated 2015-03-19</h3>
+<h2 id="changes">Changes</h2>
+<h3 id="slider-080-incubating">Slider 0.80-incubating</h3>
+<p>A major rework of placement has taken place, <a
href="https://issues.apache.org/jira/browse/SLIDER-611">Ãber-JIRA : placement
phase 2</a></p>
+<ol>
+<li>Slider manages the process of relaxing a request from a specific host to
"anywhere in the cluster".</li>
+<li>Each role/component type may have a configurable timeout for
<code>escalation</code> to begin.</li>
+<li>Slider periodically checks (every 30s by default) to see if there are
outstanding requests
+that have reached their escalation timeout and yet have not been
satisfied.</li>
+<li>Such requests are cancelled and "relaxed" requests re-issued.</li>
+<li>Labels are always respected; even relaxed requests use any labels
specified in <code>resources.json</code></li>
+<li>If a node is considered unreliable (as per-the slider 0.70 changes), it is
not used in the initial
+request. </li>
+</ol>
+<h4 id="strict-placement"><code>strict</code> placement</h4>
+<p>Again, "strict placement" has a different policy: once a component has been
deployed on a node,
+one component request will be made against that node, even if it is considered
unreliable. No
+relaxation of the request will ever take place.</p>
+<h4 id="none-placement"><code>none</code> placement</h4>
+<p>If the placement policy is "none", the request will always be relaxed.
+While tracking of recent failure counts takes place, it is not used in
placement requests.</p>
+<h4 id="anti-affine-placement"><code>anti-affine</code> placement</h4>
+<p>There's still no explicit support for this in YARN or slider. As noted
above, Slider does
+try to spread placement when rebuilding an application, but otherwise it
accepts which
+hosts YARN allocates containers on.</p>
+<h3 id="slider-070-incubating">Slider 0.70-incubating</h3>
+<p>The Slider AM tracks the recent failure count of each role on each (known)
YARN node; when
+the failure count of a role is above a role-specific threshold, no more
explicit requests are
+made for that host. Exception: "strict placement" component deployments are
always requsted
+against the node irrespective of their history.</p>
+<p>This failure count is reset on a regular schedule to avoid a past history
of contaminating
+Slider's placement history indefinitely. The fact that a node was unreliable
two weeks ago should
+not mean that it should be blacklisted this week. </p>
+<p>Note that this failure history is not persisted; it is lost on AM
start/restart.</p>
+<h3 id="slider-060-incubating">Slider 0.60-incubating</h3>
+<ol>
+<li>Slider supports <code>strict placement</code> in which a request is when
made for a previously
+used node and never used again.</li>
+<li>Different component types can be assigned specific labels to be deployed
upon. This allows
+cluster administrators to partition the YARN cluster up into different
functions, with
+specific Slider Application instances only requesting components on
differently labelled sets of
+nodes. As such, it offers significant control of component placement, without
actually dedicating
+hardware to specific applications. Note that the HDFS filesystem is still
shared across the cluster,
+so IO from other in-cluster applications will impinge on each other.</li>
</ol>
<h2 id="introduction">Introduction</h2>
<p>Slider needs to bring up instances of a given role on the machine(s) on
which
-they last ran -it should remember after shrinking or freezing a cluster which
-servers were last used for a role -and use this (persisted) data to select
-clusters next time</p>
+they last ran. It must remember after shrinking or freezing an Application
Instance which
+servers were last used for a role. It must then try to use this (persisted)
data to select
+hosts next time</p>
<p>It does this in the basis that the role instances prefer node-local
access to data previously persisted to HDFS. This is precisely the case
for Apache HBase, which can use Unix Domain Sockets to talk to the DataNode
without using the TCP stack. The HBase master persists to HDFS the tables
assigned to specific Region Servers, and when HBase is restarted its master
tries to reassign the same tables back to Region Servers on the same
machine.</p>
-<p>For this to work in a dynamic cluster, Slider needs to bring up Region
Servers
+<p>For this to work in a dynamic environment, Slider needs to bring up Region
Servers
on the previously used hosts, so that the HBase Master can re-assign the same
tables.</p>
<p>Note that it does not need to care about the placement of other roles, such
@@ -213,25 +242,30 @@ as the HBase masters -there anti-affinit
the key requirement.</p>
<h3 id="terminology">Terminology</h3>
<ul>
-<li><strong>Role Instance</strong> : a single instance of a role.</li>
-<li><strong>Node</strong> : A server in the YARN Physical (or potentially
virtual) Cluster of servers.</li>
-<li><strong>Slider Cluster</strong>: The set of role instances deployed by
Slider so as to
- create a single aggregate application.</li>
+<li><strong>Node</strong> : A server in the YARN Physical (or potentially
virtual) cluster of servers.</li>
+<li><strong>Application Instance</strong>: A deployment of a distributed
application as created and
+managed by Slider. </li>
+<li><strong>Component</strong> a non-unique part of the distributed
application; that is, an application
+is defined as containing different components; zero or more
<strong>instances</strong> of each component
+may be declared as required in this Application Instance.
+Internally the term <code>role</code> is used in classes and variables; this
is the original term used,
+and so retained across much of the code base.</li>
+<li><strong>Component Instance</strong> : a single instance of a component.
</li>
<li><strong>Slider AM</strong>: The Application Master of Slider: the program
deployed by YARN to
-manage its Slider Cluster.</li>
+manage its Application Instance.</li>
<li><strong>RM</strong> YARN Resource Manager</li>
</ul>
<h3 id="assumptions">Assumptions</h3>
<p>Here are some assumptions in Slider's design</p>
<ol>
<li>
-<p>Instances of a specific role should preferably be deployed onto different
+<p>Instances of a specific component should preferably be deployed onto
different
servers. This enables Slider to only remember the set of server nodes onto
which instances were created, rather than more complex facts such as "two
Region
Servers were previously running on Node #17. On restart Slider can simply
request
one instance of a Region Server on a specific node, leaving the other instance
to be arbitrarily deployed by YARN. This strategy should help reduce the
<em>affinity</em>
-in the role deployment, so increase their resilience to failure.</p>
+in the component deployment, so increase their resilience to failure.</p>
</li>
<li>
<p>There is no need to make sophisticated choices on which nodes to request
@@ -240,53 +274,53 @@ instance and prioritizing nodes based on
only priority needed when asking for nodes is <em>ask for the most recently
used</em>.</p>
</li>
<li>
-<p>Different roles are independent: it is not an issue if a role of one type
+<p>Different roles are independent: it is not an issue if a component of one
type
(example, an Accumulo Monitor and an Accumulo Tablet Server) are on the same
host. This assumption allows Slider to only worry about affinity issues within
- a specific role, rather than across all roles.</p>
+ a specific component, rather than across all roles.</p>
</li>
<li>
-<p>After a cluster has been started, the rate of change of the cluster is
-low: both node failures and cluster flexing happen at the rate of every few
+<p>After an Application Instance has been started, the rate of change of the
application is
+low: both node failures and flexing happen at the rate of every few
hours, rather than every few seconds. This allows Slider to avoid needing
data structures and layout persistence code designed for regular and repeated
changes.</p>
</li>
<li>
<p>Instance placement is best-effort: if the previous placement cannot be
satisfied,
-the application will still perform adequately with role instances deployed
+the application will still perform adequately with component instances deployed
onto new servers. More specifically, if a previous server is unavailable
-for hosting a role instance due to lack of capacity or availability, Slider
+for hosting a component instance due to lack of capacity or availability,
Slider
will not decrement the number of instances to deploy: instead it will rely
on YARN to locate a new node -ideally on the same rack.</p>
</li>
<li>
-<p>If two instances of the same role do get assigned to the same server, it
+<p>If two instances of the same component do get assigned to the same server,
it
is not a failure condition. (This may be problematic for some roles
--we may need a role-by-role policy here, so that master nodes can be
anti-affine)
+-we may need a component-by-component policy here, so that master nodes can be
anti-affine)
[specifically, >1 HBase master mode will not come up on the same host]</p>
</li>
<li>
-<p>If a role instance fails on a specific node, asking for a container on
+<p>If a component instance fails on a specific node, asking for a container on
that same node for the replacement instance is a valid recovery strategy.
This contains assumptions about failure modes -some randomness here may
be a valid tactic, especially for roles that do not care about locality.</p>
</li>
<li>
<p>Tracking failure statistics of nodes may be a feature to add in future;
-designing the Role History datastructures to enable future collection
+designing the RoleHistory datastructures to enable future collection
of rolling statistics on recent failures would be a first step to this </p>
</li>
</ol>
-<h3 id="the-role-history">The Role History</h3>
-<p>The <code>RoleHistory</code> is a datastructure which models the role
assignment, and
+<h3 id="the-rolehistory-datastructure">The RoleHistory Datastructure</h3>
+<p>The <code>RoleHistory</code> is a datastructure which models the component
assignment, and
can persist it to and restore it from the (shared) filesystem.</p>
<ul>
<li>
-<p>For each role, there is a list of cluster nodes which have supported this
role
+<p>For each component, there is a list of YARN cluster nodes which have
supported this component
used in the past.</p>
</li>
<li>
-<p>This history is used when selecting a node for a role.</p>
+<p>This history is used when selecting a node for a component.</p>
</li>
<li>
<p>This history remembers when nodes were allocated. These are re-requested
@@ -304,8 +338,8 @@ with YARN. This ensures that the same no
due to outstanding requests.</p>
</li>
<li>
-<p>It does not retain a complete history of the role -and does not need to.
-All it needs to retain is the recent history for every node onto which a role
+<p>It does not retain a complete history of the component -and does not need
to.
+All it needs to retain is the recent history for every node onto which a
component
instance has been deployed. Specifically, the last allocation or release
operation on a node is all that needs to be persisted.</p>
</li>
@@ -314,33 +348,33 @@ operation on a node is all that needs to
as active -as they were from the previous instance.</p>
</li>
<li>
-<p>On AM restart, nodes in the role history marked as active have to be
considered
-still active -the YARN RM will have to provide the full list of which are
not.</p>
+<p>On AM restart, nodes in the history marked as active have to be considered
+still active âthe YARN RM will have to provide the full list of which are
not.</p>
</li>
<li>
-<p>During cluster flexing, nodes marked as released -and for which there is no
+<p>During flexing, nodes marked as released -and for which there is no
outstanding request - are considered candidates for requesting new
instances.</p>
</li>
<li>
-<p>When choosing a candidate node for hosting a role instance, it from the head
-of the time-ordered list of nodes that last ran an instance of that role</p>
+<p>When choosing a candidate node for hosting a component instance, it from
the head
+of the time-ordered list of nodes that last ran an instance of that
component</p>
</li>
</ul>
<h3 id="persistence">Persistence</h3>
-<p>The state of the role is persisted to HDFS on changes -but not on cluster
+<p>The state of the component is persisted to HDFS on changes âbut not on
Application Instance
termination.</p>
<ol>
-<li>When nodes are allocated, the Role History is marked as dirty</li>
-<li>When container release callbacks are received, the Role History is marked
as dirty</li>
-<li>When nodes are requested or a release request made, the Role History is
<em>not</em>
+<li>When nodes are allocated, the RoleHistory is marked as dirty</li>
+<li>When container release callbacks are received, the RoleHistory is marked
as dirty</li>
+<li>When nodes are requested or a release request made, the RoleHistory is
<em>not</em>
marked as dirty. This information is not relevant on AM restart.</li>
</ol>
<p>As at startup, a large number of allocations may arrive in a short period
of time,
-the Role History may be updated very rapidly -yet as the containers are
+the RoleHistory may be updated very rapidly -yet as the containers are
only recently activated, it is not likely that an immediately restarted Slider
-cluster would gain by re-requesting containers on them -their historical
+Application Instance would gain by re-requesting containers on them -their
historical
value is more important than their immediate past.</p>
-<p>Accordingly, the role history may be persisted to HDFS asynchronously, with
+<p>Accordingly, the RoleHistory may be persisted to HDFS asynchronously, with
the dirty bit triggering an flushing of the state to HDFS. The datastructure
will still need to be synchronized for cross thread access, but the
sync operation will not be a major deadlock, compared to saving the file on
every
@@ -355,7 +389,7 @@ with the data saved in JSON or compresse
instances on this node; there is no blacklisting. As a central blacklist
for YARN has been proposed, it is hoped that this issue will be addressed
centrally,
without Slider having to remember which nodes are unreliable <em>for that
particular
-Slider cluster</em>.</p>
+Application Instance</em>.</p>
<p><strong>Anti-affinity</strong>: If multiple role instances are assigned to
the same node,
Slider has to choose on restart or flexing whether to ask for multiple
nodes on that node again, or to pick other nodes. The assumed policy is
@@ -369,13 +403,13 @@ history of a node -maintaining some kind
node use and picking the heaviest used, or some other more-complex algorithm.
This may be possible, but we'd need evidence that the problem existed before
trying to address it.</p>
-<h1 id="the-nodemap-the-core-of-the-role-history">The NodeMap: the core of the
Role History</h1>
-<p>The core data structure, the <code>NodeMap</code> is a map of every known
node in the cluster, tracking
+<h1 id="the-nodemap">The NodeMap</h1>
+<p>A core data structure, the <code>NodeMap</code>, is a map of every known
node in the YARN cluster, tracking
how many containers are allocated to specific roles in it, and, when there
are no active instances, when it was last used. This history is used to
choose where to request new containers. Because of the asynchronous
-allocation and release of containers, the Role History also needs to track
-outstanding release requests --and, more critically, outstanding allocation
+allocation and release of containers, the History also needs to track
+outstanding release requests âand, more critically, outstanding allocation
requests. If Slider has already requested a container for a specific role
on a host, then asking for another container of that role would break
anti-affinity requirements. Note that not tracking outstanding requests would
@@ -397,7 +431,7 @@ let YARN choose.</li>
<li>Handles the multi-container on one node problem</li>
<li>By storing details about every role, cross-role decisions could be
possible</li>
<li>Simple counters can track the state of pending add/release requests</li>
-<li>Scales well to a rapidly flexing cluster</li>
+<li>Scales well to a rapidly flexing Application Instance</li>
<li>Simple to work with and persist</li>
<li>Easy to view and debug</li>
<li>Would support cross-role collection of node failures in future</li>
@@ -413,7 +447,7 @@ of recently explicitly released nodes ca
<li>Need to track outstanding requests against nodes, so that if a request
was satisfied on a different node, the original node's request count is
decremented, <em>not that of the node actually allocated</em>. </li>
-<li>In a virtual cluster, may fill with node entries that are no longer in the
cluster.
+<li>In a virtual YARN cluster, may fill with node entries that are no longer
in the cluster.
Slider should query the RM (or topology scripts?) to determine if nodes are
still
parts of the YARN cluster. </li>
</ul>
@@ -439,7 +473,7 @@ parts of the YARN cluster. </li>
<p>Maps a YARN NodeID record to a Slider <code>NodeInstance</code>
structure</p>
<h3 id="nodeinstance">NodeInstance</h3>
-<p>Every node in the cluster is modeled as an ragged array of
<code>NodeEntry</code> instances, indexed
+<p>Every node in the YARN cluster is modeled as an ragged array of
<code>NodeEntry</code> instances, indexed
by role index -</p>
<div class="codehilite"><pre><span class="n">NodeEntry</span><span
class="p">[</span><span class="n">roles</span><span class="p">]</span>
<span class="n">get</span><span class="p">(</span><span
class="n">roleId</span><span class="p">):</span> <span
class="n">NodeEntry</span> <span class="n">or</span> <span class="n">null</span>
@@ -488,7 +522,7 @@ so that the selection of containers to r
to index the specific role in a container so as to determine which role
has been offered in a container allocation message, and which role has
been released on a release event.</p>
-<p>The Role History needs to track outstanding requests, so that
+<p>The History needs to track outstanding requests, so that
when an allocation comes in, it can be mapped back to the original request.
Simply looking up the nodes on the provided container and decrementing
its request counter is not going to work -the container may be allocated
@@ -498,7 +532,7 @@ on a different node from that requested.
rolling integer -Slider will assume that after 2^24 requests per role, it can
be rolled,
-though as we will be retaining a list of outstanding requests, a clash should
not occur.
The main requirement is: not have > 2^24 outstanding requests for
instances of a specific role,
-which places an upper bound on the size of a Slider cluster.</p>
+which places an upper bound on the size of the Application Instance.</p>
<p>The splitting and merging will be implemented in a ContainerPriority class,
for uniform access.</p>
<h3 id="outstandingrequest">OutstandingRequest</h3>
@@ -549,7 +583,7 @@ to 1 Sort + M list lookups.</p>
list of all Nodes which are available for an instance of that role,
using a comparator that places the most recently released node ahead of older
nodes.</p>
-<p>This list is not persisted -when a Slider Cluster is stopped it is moot,
and when
+<p>This list is not persisted âwhen the Application Instance is stopped it
is moot, and when
an AM is restarted this structure will be rebuilt.</p>
<ol>
<li>When a node is needed for a new request, this list is consulted first.</li>
@@ -560,22 +594,22 @@ of the list for that role.</li>
</ol>
<p>If the list is empty during a container request operation, it means
that the Role History does not know of any nodes
-in the cluster that have hosted instances of that role and which are not
+in the YARN cluster that have hosted instances of that role and which are not
in use. There are then two possible strategies to select a role</p>
<ol>
-<li>Ask for an instance anywhere in the cluster (policy in Slider 0.5)</li>
+<li>Ask for an instance anywhere in the YARN cluster (policy in Slider
0.5)</li>
<li>Search the node map to identify other nodes which are (now) known about,
-but which are not hosting instances of a specific role -this can be used
+but which are not hosting instances of a specific role âthis can be used
as the target for the next resource request.</li>
</ol>
-<p>Strategy #1 is simpler; Strategy #2 <em>may</em> decrease the affinity in
the cluster,
+<p>Strategy #1 is simpler; Strategy #2 <em>may</em> decrease the affinity of
component placement,
as the AM will be explicitly requesting an instance on a node which it knows
is not running an instance of that role.</p>
<h4 id="issue-what-to-do-about-failing-nodes">ISSUE What to do about failing
nodes?</h4>
<p>Should a node whose container just failed be placed at the
top of the stack, ready for the next request? </p>
<p>If the container failed due to an unexpected crash in the application,
asking
-for that container back <em>is the absolute right strategy</em> -it will bring
+for that container back <em>is the absolute right strategy</em> âit will
bring
back a new role instance on that machine. </p>
<p>If the container failed because the node is now offline, the container
request
will not be satisfied by that node.</p>
@@ -585,10 +619,10 @@ then re-requesting containers on it will
<h3 id="bootstrap">Bootstrap</h3>
<p>Persistent Role History file not found; empty data structures created.</p>
<h3 id="restart">Restart</h3>
-<p>When starting a cluster, the Role History should be loaded. </p>
+<p>When starting an Application Instance, the Role History should be loaded.
</p>
<p>If the history is missing <em>or cannot be loaded for any reason</em>,
Slider must revert to the bootstrap actions.</p>
-<p>If found, the Role History will contain Slider's view of the Slider
Cluster's
+<p>If found, the Role History will contain Slider's view of the Application
Instance's
state at the time the history was saved, explicitly recording the last-used
time of all nodes no longer hosting a role's container. By noting which roles
were actually being served, it implicitly notes which nodes have a
<code>last_used</code>
@@ -597,9 +631,9 @@ all node entries listed as having active
saved must have more recent data than those nodes listed as inactive.</p>
<p>When rebuilding the data structures, the fact that nodes were active at
save time must be converted into the data that indicates that the nodes
-were at least in use <em>at the time the data was saved</em>. The state of the
cluster
-after the last save is unknown.</p>
-<p>1: Role History loaded; Failure => Bootstrap.
+were at least in use <em>at the time the data was saved</em>.
+The state of the Application Instance after the last save is unknown.</p>
+<p>1: History loaded; Failure => Bootstrap.
2: Future: if role list enum != current enum, remapping could take place.
Until then: fail.
3: Mark all nodes as active at save time to that of the</p>
<div class="codehilite"><pre><span class="c1">//define a threshold</span>
@@ -645,26 +679,21 @@ from the list of available nodes</p>
</pre></div>
-<p>There's no need to resort the available node list -all that has happened
+<p>There's no need to resort the available node list âall that has happened
is that some entries have been removed</p>
-<p><strong>Issue</strong>: what if requests come in for a <code>(role,
requestID)</code> for
-the previous instance of the AM? Could we just always set the initial
-requestId counter to a random number and hope the collision rate is very, very
-low (2^24 * #(outstanding_requests)). If YARN-1041 ensures that
-a restarted AM does not receive outstanding requests, this issue goes away.</p>
<h3 id="teardown">Teardown</h3>
<ol>
<li>If dirty, save role history to its file.</li>
<li>Issue release requests</li>
-<li>Maybe update data structures on responses, but do not mark Role History
+<li>Maybe update data structures on responses, but do not mark RoleHistory
as dirty or flush it to disk.</li>
</ol>
<p>This strategy is designed to eliminate the expectation that there will ever
-be a clean shutdown -and so that the startup-time code should expect
-the Role History to have been written during shutdown. Instead the code
+be a clean shutdown âand so that the startup-time code should expect
+the RoleHistory data to have been written during shutdown. Instead the code
should assume that the history was saved to disk at some point during the life
-of the Slider Cluster -ideally after the most recent change, and that the
information
-in it is only an approximate about what the previous state of the cluster
was.</p>
+of the Application Instance âideally after the most recent change, and that
the information
+in it is only an approximate about what the previous state of the Application
Instance was.</p>
<h3 id="flex-requesting-a-container-in-role-role">Flex: Requesting a container
in role <code>role</code></h3>
<div class="codehilite"><pre><span class="n">node</span> <span
class="p">=</span> <span class="n">availableNodes</span><span
class="p">[</span><span class="n">roleId</span><span class="p">].</span><span
class="n">pop</span><span class="p">()</span>
<span class="k">if</span> <span class="n">node</span> !<span
class="p">=</span> <span class="n">null</span> <span class="p">:</span>
@@ -679,7 +708,7 @@ in it is only an approximate about what
<p>There is a bias here towards previous nodes, even if the number of nodes
-in the cluster has changed. This is why a node is picked where the number
+in the Application Instance has changed. This is why a node is picked where
the number
of <code>active-releasing == 0 and requested == 0</code>, rather than where it
is simply the lowest
value of <code>active + requested - releasing</code>: if there is no node in
the nodemap that
is not running an instance of that role, it is left to the RM to decide where
@@ -687,11 +716,11 @@ the role instance should be instantiated
<p>This bias towards previously used nodes also means that (lax) requests
will be made of nodes that are currently unavailable either because they
are offline or simply overloaded with other work. In such circumstances,
-the node will have an active count of zero -so the search will find these
-nodes and request them -even though the requests cannot be satisfied.
+the node will have an active count of zero âso the search will find these
+nodes and request them âeven though the requests cannot be satisfied.
As a result, the request will be downgraded to a rack-local or cluster-wide,
-request -an acceptable degradation on a cluster where all the other entries
-in the nodemap have instances of that specific node -but not when there are
+request âan acceptable degradation on an Application Instance where all the
other entries
+in the nodemap have instances of that specific node âbut not when there are
empty nodes. </p>
<h4 id="solutions">Solutions</h4>
<ol>
@@ -701,7 +730,7 @@ iterate through the values. This would p
node from being requested first.</p>
</li>
<li>
-<p>Keep track of requests, perhaps through a last-requested counter -and use
+<p>Keep track of requests, perhaps through a last-requested counter âand use
this in the selection process. This would radically complicate the selection
algorithm, and would not even distinguish "node recently released that was
also the last requested" from "node that has not recently satisfied requests
@@ -718,13 +747,13 @@ is used in place of a recent one and the
<p>But there are consequences:</p>
<p><strong>Performance</strong>:</p>
<p>Using the history to pick a recent node may increase selection times on a
-large cluster, as for every instance needed, a scan of all nodes in the
+large Application Instance, as for every instance needed, a scan of all nodes
in the
nodemap is required (unless there is some clever bulk assignment list being
built
up), or a sorted version of the nodemap is maintained, with a node placed
at the front of this list whenever its is updated.</p>
<p><strong>Startup-time problems</strong></p>
<p>There is also the risk that while starting an applicatin instance, the
<code>rolehistory.saved</code>
-flag may be updated while the cluster flex is in progress, so making the saved
+flag may be updated while the flex is in progress, so making the saved
nodes appear out of date. Perhaps the list of recently released nodes could
be rebuilt at startup time.</p>
<p>The proposed <code>recentlyReleasedList</code> addresses this, though it
creates
@@ -737,7 +766,7 @@ from the last-used fields in the node en
<p>This is the callback received when containers have been allocated.
Due to (apparently) race conditions, the AM may receive duplicate
-container allocations -Slider already has to recognize this and
+container allocations âSlider already has to recognize this and
currently simply discards any surplus.</p>
<p>If the AM tracks outstanding requests made for specific hosts, it
will need to correlate allocations with the original requests, so as to
decrement
@@ -799,7 +828,7 @@ other sync problem.</p>
<li>
<p>The node selected for the original request has its request for a role
instance
decremented, so that it may be viewed as available again. The node is also
-re-inserted into the AvailableNodes list -not at its head, but at its position
+re-inserted into the AvailableNodes list ânot at its head, but at its
position
in the total ordering of the list.</p>
</li>
</ol>
@@ -814,11 +843,11 @@ in the associated <code>RoleInstance</co
adjusted to indicate it has one more live node and one less starting node.</p>
<h3 id="nmclientasync-callback-oncontainerstartfailed">NMClientAsync Callback:
onContainerStartFailed()</h3>
<p>The AM uses this as a signal to remove the container from the list
-of starting containers -the count of starting containers for the relevant
+of starting containers âthe count of starting containers for the relevant
NodeEntry is decremented. If the node is now available for instances of this
container, it is returned to the queue of available nodes.</p>
-<h3 id="flex-releasing-a-role-instance-from-the-cluster">Flex: Releasing a
role instance from the cluster</h3>
-<p>Simple strategy: find a node with at least one active container</p>
+<h3
id="flex-releasing-a-component-instance-from-the-application-instance">Flex:
Releasing a Component instance from the Application Instance</h3>
+<p>Simple strategy: find a node with at least one active container of that
type</p>
<div class="codehilite"><pre><span class="n">select</span> <span
class="n">a</span> <span class="n">node</span> <span class="n">N</span> <span
class="n">in</span> <span class="n">nodemap</span> <span class="n">where</span>
<span class="k">for</span> <span class="n">NodeEntry</span><span
class="p">[</span><span class="n">roleId</span><span class="p">]:</span> <span
class="n">active</span> <span class="o">></span> <span
class="n">releasing</span><span class="p">;</span>
<span class="n">nodeentry</span> <span class="p">=</span> <span
class="n">node</span><span class="p">.</span><span class="n">get</span><span
class="p">(</span><span class="n">roleId</span><span class="p">)</span>
<span class="n">nodeentry</span><span class="p">.</span><span
class="n">active</span><span class="o">--</span><span class="p">;</span>
@@ -888,7 +917,7 @@ has completed although it wasn't on the
<p>By calling <code>reviewRequestAndReleaseNodes()</code> the AM triggers
-a re-evaluation of how many instances of each node a cluster has, and how many
+a re-evaluation of how many instances of each node an Application Instance
has, and how many
it needs. If a container has failed and that freed up all role instances
on that node, it will have been inserted at the front of the
<code>availableNodes</code> list.
As a result, it is highly likely that a new container will be requested on
@@ -898,10 +927,10 @@ be if other containers were completed in
<p>Notes made while implementing the design.</p>
<p><code>OutstandingRequestTracker</code> should also track requests made with
no target node; this makes seeing what is going on easier.
<code>ARMClientImpl</code>
-is doing something similar, on a priority-by-priority basis -if many
+is doing something similar, on a priority-by-priority basis âif many
requests are made, each with their own priority, that base class's hash tables
may get overloaded. (it assumes a limited set of priorities)</p>
-<p>Access to the role history datastructures was restricted to avoid
+<p>Access to the RoleHistory datastructures was restricted to avoid
synchronization problems. Protected access is permitted so that a
test subclass can examine (and change?) the internals.</p>
<p>`NodeEntries need to add a launching value separate from active so that
@@ -909,7 +938,7 @@ when looking for nodes to release, no at
a node that has been allocated but is not yet live.</p>
<p>We can't reliably map from a request to a response. Does that matter?
If we issue a request for a host and it comes in on a different port, do we
-care? Yes -but only because we are trying to track nodes which have requests
+care? Yes âbut only because we are trying to track nodes which have requests
outstanding so as not to issue new ones. But if we just pop the entry
off the available list, that becomes moot.</p>
<p>Proposal: don't track the requesting numbers in the node entries, just
@@ -918,7 +947,7 @@ in the role status fields.</p>
node on them was requested but not satisfied.</p>
<p>Other issues: should we place nodes on the available list as soon as all
the entries
have been released? I.e. Before YARN has replied</p>
-<p>RoleStats were removed -left in app state. Although the rolestats would
+<p>RoleStats were removed âleft in app state. Although the rolestats would
belong here, leaving them where they were reduced the amount of change
in the <code>AppState</code> class, so risk of something breaking.</p>
<h2 id="miniyarncluster-node-ids">MiniYARNCluster node IDs</h2>
@@ -927,14 +956,14 @@ against file://; so mini tests with >
<code>NodeId:NodeInstance</code>. What will happen is that
<code>NodeInstance getOrCreateNodeInstance(Container container) '
will always return the same (now shared)</code>NodeInstance`.</p>
-<h2 id="releasing-containers-when-shrinking-a-cluster">Releasing Containers
when shrinking a cluster</h2>
+<h2 id="releasing-containers-when-shrinking-an-application-instance">Releasing
Containers when shrinking an Application Instance</h2>
<p>When identifying instances to release in a bulk downscale operation, the
full
list of targets must be identified together. This is not just to eliminate
multiple scans of the data structures, but because the containers are not
-released until the queued list of actions are executed -the nodes'
release-in-progress
+released until the queued list of actions are executed âthe nodes'
release-in-progress
counters will not be incremented until after all the targets have been
identified.</p>
<p>It also needs to handle the scenario where there are many role instances on
a
-single server -it should prioritize those. </p>
+single server âit should prioritize those. </p>
<p>The NodeMap/NodeInstance/NodeEntry structure is adequate for identifying
nodes,
at least provided there is a 1:1 mapping of hostname to NodeInstance. But it
is not enough to track containers in need of release: the AppState needs
@@ -950,24 +979,24 @@ in the container</p>
role should be dropped from the list.</p>
<p>This can happen when an instance was allocated on a different node from
that requested.</p>
-<h3
id="finding-a-node-when-a-role-has-instances-in-the-cluster-but-nothing">Finding
a node when a role has instances in the cluster but nothing</h3>
-<p>known to be available</p>
+<h3
id="finding-a-node-when-a-component-has-some-live-instances-and-no-available-hosts-known-to">Finding
a node when a component has some live instances and no available hosts known
to</h3>
+<p>Slider.</p>
<p>One condition found during testing is the following: </p>
<ol>
-<li>A role has one or more instances running in the cluster</li>
+<li>A role has one or more instances running in the YARN cluster</li>
<li>A role has no entries in its available list: there is no history of the
role ever being on nodes other than which is currently in use.</li>
<li>A new instance is requested.</li>
</ol>
<p>In this situation, the <code>findNodeForNewInstance</code> method returns
null: there
is no recommended location for placement. However, this is untrue: all
-nodes in the cluster <code>other</code> than those in use are the recommended
nodes. </p>
-<p>It would be possible to build up a list of all known nodes in the cluster
that
+nodes in the YARN cluster <code>other</code> than those in use are the
recommended nodes. </p>
+<p>It would be possible to build up a list of all known nodes in the YARN
cluster that
are not running this role and use that in the request, effectively telling the
AM to pick one of the idle nodes. By not doing so, we increase the probability
that another instance of the same role will be allocated on a node in use,
a probability which (were there capacity on these nodes and placement random),
be
-<code>1/(clustersize-roleinstances)</code>. The smaller the cluster and the
bigger the
+<code>1/(clustersize-roleinstances)</code>. The smaller the YARN cluster and
the bigger the
application, the higher the risk.</p>
<p>This could be revisited, if YARN does not support anti-affinity between new
requests at a given priority and existing ones: the solution would be to
@@ -995,17 +1024,17 @@ the ordering of that list in most-recent
the set of outstanding requests will be retained, so all these hosts in the
will be considered unavailable for new location-specific requests.
This may imply that new requests that could be explicity placed will now only
-be randomly placed -however, it is moot on the basis that if there are
outstanding
+be randomly placed âhowever, it is moot on the basis that if there are
outstanding
container requests it means the RM cannot grant resources: new requests at the
same priority (i.e. same Slider Role ID) will not be granted either.</p>
<p>The only scenario where this would be different is if the resource
requirements
-of instances of the target role were decreated during a cluster flex such that
+of instances of the target role were decreated during a flex such that
the placement could now be satisfied on the target host. This is not considered
a significant problem.</p>
<h1 id="persistence_1">Persistence</h1>
<p>The initial implementation uses the JSON-formatted Avro format; while
significantly
less efficient than a binary format, it is human-readable</p>
-<p>Here are sequence of entries from a test run on a single node cluster;
running 1 HBase Master
+<p>Here are sequence of entries from a test run on a single node YARN cluster;
running 1 HBase Master
and two region servers.</p>
<p>Initial save; the instance of Role 1 (HBase master) is live, Role 2 (RS) is
not.</p>
<div class="codehilite"><pre><span class="p">{</span>"<span
class="n">entry</span>"<span class="p">:{</span>"<span
class="n">org</span><span class="p">.</span><span class="n">apache</span><span
class="p">.</span><span class="n">hoya</span><span class="p">.</span><span
class="n">avro</span><span class="p">.</span><span
class="n">RoleHistoryHeader</span>"<span class="p">:{</span>"<span
class="n">version</span>"<span class="p">:</span>1<span
class="p">,</span>"<span class="n">saved</span>"<span
class="p">:</span>1384183475949<span class="p">,</span>"<span
class="n">savedx</span>"<span class="p">:</span>"14247<span
class="n">c3aeed</span>"<span class="p">,</span>"<span
class="n">roles</span>"<span class="p">:</span>3<span class="p">}}}</span>
@@ -1028,8 +1057,8 @@ and two region servers.</p>
</pre></div>
-<p>At this point the cluster was stopped and started.</p>
-<p>When the cluster is restarted, every node that was active for a role at the
time the file was saved <code>1384183476028</code>
+<p>At this point the Application Instance was stopped and started.</p>
+<p>When the Application Instance is restarted, every node that was active for
a role at the time the file was saved <code>1384183476028</code>
is given a last_used timestamp of that time. </p>
<p>When the history is next saved, the master has come back onto the (single)
node,
it is active while its <code>last_used</code> timestamp is the previous file's
timestamp.
@@ -1054,7 +1083,7 @@ No region servers are yet live.</p>
</pre></div>
-<p>The <code>last_used</code> timestamps will not be changed until the cluster
is shrunk or restarted, as the <code>active</code> flag being set
+<p>The <code>last_used</code> timestamps will not be changed until the
Application Instance is shrunk or restarted, as the <code>active</code> flag
being set
implies that the server is running both roles at the save time of
<code>1384183512217</code>.</p>
<h2 id="resolved-issues">Resolved issues</h2>
<blockquote>