Author: buildbot
Date: Thu Mar 19 16:48:45 2015
New Revision: 944362

Log:
Staging update by buildbot for slider

Modified:
    websites/staging/slider/trunk/content/   (props changed)
    websites/staging/slider/trunk/content/design/rolehistory.html

Propchange: websites/staging/slider/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Mar 19 16:48:45 2015
@@ -1 +1 @@
-1667103
+1667813

Modified: websites/staging/slider/trunk/content/design/rolehistory.html
==============================================================================
--- websites/staging/slider/trunk/content/design/rolehistory.html (original)
+++ websites/staging/slider/trunk/content/design/rolehistory.html Thu Mar 19 
16:48:45 2015
@@ -177,35 +177,64 @@ Latest release: <strong>0.60.0-incubatin
    limitations under the License.
 -->
 
-<h1 
id="apache-slider-role-history-how-slider-brings-back-nodes-in-the-same-location">Apache
 Slider Role History: how Slider brings back nodes in the same location</h1>
-<h3 id="last-updated-2013-12-06">Last updated  2013-12-06</h3>
-<ul>
-<li>This document uses the pre-slider terminology of role/cluster and not
-component and application instance *</li>
-</ul>
-<h2 id="outstanding-issues">Outstanding issues</h2>
-<ol>
-<li>
-<p>Can we use the history to implement anti-affinity: for any role with this 
flag,
-use our knowledge of the cluster to ask for all nodes that aren't in use 
already</p>
-</li>
-<li>
-<p>How to add blacklisting here? We are tracking failures and startup failures
-per node (not persisted), but not using this in role placement requests 
yet.</p>
-</li>
+<h1 
id="apache-slider-placement-how-slider-brings-back-nodes-in-the-same-location">Apache
 Slider Placement: how Slider brings back nodes in the same location</h1>
+<h3 id="last-updated-2015-03-19">Last updated  2015-03-19</h3>
+<h2 id="changes">Changes</h2>
+<h3 id="slider-080-incubating">Slider 0.80-incubating</h3>
+<p>A major rework of placement has taken place, <a 
href="https://issues.apache.org/jira/browse/SLIDER-611";>Über-JIRA : placement 
phase 2</a></p>
+<ol>
+<li>Slider manages the process of relaxing a request from a specific host to 
"anywhere in the cluster".</li>
+<li>Each role/component type may have a configurable timeout for 
<code>escalation</code> to begin.</li>
+<li>Slider periodically checks (every 30s by default) to see if there are 
outstanding requests
+that have reached their escalation timeout and yet have not been 
satisfied.</li>
+<li>Such requests are cancelled and "relaxed" requests re-issued.</li>
+<li>Labels are always respected; even relaxed requests use any labels 
specified in <code>resources.json</code></li>
+<li>If a node is considered unreliable (as per-the slider 0.70 changes), it is 
not used in the initial
+request. </li>
+</ol>
+<h4 id="strict-placement"><code>strict</code> placement</h4>
+<p>Again, "strict placement" has a different policy: once a component has been 
deployed on a node,
+one component request will be made against that node, even if it is considered 
unreliable. No
+relaxation of the request will ever take place.</p>
+<h4 id="none-placement"><code>none</code> placement</h4>
+<p>If the placement policy is "none", the request will always be relaxed.
+While tracking of recent failure counts takes place, it is not used in 
placement requests.</p>
+<h4 id="anti-affine-placement"><code>anti-affine</code> placement</h4>
+<p>There's still no explicit support for this in YARN or slider. As noted 
above, Slider does
+try to spread placement when rebuilding an application, but otherwise it 
accepts which
+hosts YARN allocates containers on.</p>
+<h3 id="slider-070-incubating">Slider 0.70-incubating</h3>
+<p>The Slider AM tracks the recent failure count of each role on each (known) 
YARN node; when
+the failure count of a role is above a role-specific threshold, no more 
explicit requests are
+made for that host. Exception: "strict placement" component deployments are 
always requsted
+against the node irrespective of their history.</p>
+<p>This failure count is reset on a regular schedule to avoid a past history 
of contaminating
+Slider's placement history indefinitely. The fact that a node was unreliable 
two weeks ago should
+not mean that it should be blacklisted this week. </p>
+<p>Note that this failure history is not persisted; it is lost on AM 
start/restart.</p>
+<h3 id="slider-060-incubating">Slider 0.60-incubating</h3>
+<ol>
+<li>Slider supports <code>strict placement</code> in which a request is when 
made for a previously
+used node and never used again.</li>
+<li>Different component types can be assigned specific labels to be deployed 
upon. This allows
+cluster administrators to partition the YARN cluster up into different 
functions, with
+specific Slider Application instances only requesting components on 
differently labelled sets of
+nodes. As such, it offers significant control of component placement, without 
actually dedicating
+hardware to specific applications. Note that the HDFS filesystem is still 
shared across the cluster,
+so IO from other in-cluster applications will impinge on each other.</li>
 </ol>
 <h2 id="introduction">Introduction</h2>
 <p>Slider needs to bring up instances of a given role on the machine(s) on 
which
-they last ran -it should remember after shrinking or freezing a cluster  which
-servers were last used for a role -and use this (persisted) data to select
-clusters next time</p>
+they last ran. It must remember after shrinking or freezing an Application 
Instance  which
+servers were last used for a role. It must then try to use this (persisted) 
data to select
+hosts next time</p>
 <p>It does this in the basis that the role instances prefer node-local
 access to data previously persisted to HDFS. This is precisely the case
 for Apache HBase, which can use Unix Domain Sockets to talk to the DataNode
 without using the TCP stack. The HBase master persists to HDFS the tables
 assigned to specific Region Servers, and when HBase is restarted its master
 tries to reassign the same tables back to Region Servers on the same 
machine.</p>
-<p>For this to work in a dynamic cluster, Slider needs to bring up Region 
Servers
+<p>For this to work in a dynamic environment, Slider needs to bring up Region 
Servers
 on the previously used hosts, so that the HBase Master can re-assign the same
 tables.</p>
 <p>Note that it does not need to care about the placement of other roles, such
@@ -213,25 +242,30 @@ as the HBase masters -there anti-affinit
 the key requirement.</p>
 <h3 id="terminology">Terminology</h3>
 <ul>
-<li><strong>Role Instance</strong> : a single instance of a role.</li>
-<li><strong>Node</strong> : A server in the YARN Physical (or potentially 
virtual) Cluster of servers.</li>
-<li><strong>Slider Cluster</strong>: The set of role instances deployed by 
Slider so as to 
- create a single aggregate application.</li>
+<li><strong>Node</strong> : A server in the YARN Physical (or potentially 
virtual) cluster of servers.</li>
+<li><strong>Application Instance</strong>: A deployment of a distributed 
application as created and
+managed by Slider. </li>
+<li><strong>Component</strong> a non-unique part of the distributed 
application; that is, an application
+is defined as containing different components; zero or more 
<strong>instances</strong> of each component
+may be declared as required in this Application Instance.
+Internally the term <code>role</code> is used in classes and variables; this 
is the original term used,
+and so retained across much of the code base.</li>
+<li><strong>Component Instance</strong> : a single instance of a component. 
</li>
 <li><strong>Slider AM</strong>: The Application Master of Slider: the program 
deployed by YARN to
-manage its Slider Cluster.</li>
+manage its Application Instance.</li>
 <li><strong>RM</strong> YARN Resource Manager</li>
 </ul>
 <h3 id="assumptions">Assumptions</h3>
 <p>Here are some assumptions in Slider's design</p>
 <ol>
 <li>
-<p>Instances of a specific role should preferably be deployed onto different
+<p>Instances of a specific component should preferably be deployed onto 
different
 servers. This enables Slider to only remember the set of server nodes onto
 which instances were created, rather than more complex facts such as "two 
Region
 Servers were previously running on Node #17. On restart Slider can simply 
request
 one instance of a Region Server on a specific node, leaving the other instance
 to be arbitrarily deployed by YARN. This strategy should help reduce the 
<em>affinity</em>
-in the role deployment, so increase their resilience to failure.</p>
+in the component deployment, so increase their resilience to failure.</p>
 </li>
 <li>
 <p>There is no need to make sophisticated choices on which nodes to request
@@ -240,53 +274,53 @@ instance and prioritizing nodes based on
 only priority needed when asking for nodes is <em>ask for the most recently 
used</em>.</p>
 </li>
 <li>
-<p>Different roles are independent: it is not an issue if a role of one type
+<p>Different roles are independent: it is not an issue if a component of one 
type
  (example, an Accumulo Monitor and an Accumulo Tablet Server) are on the same
  host. This assumption allows Slider to only worry about affinity issues within
- a specific role, rather than across all roles.</p>
+ a specific component, rather than across all roles.</p>
 </li>
 <li>
-<p>After a cluster has been started, the rate of change of the cluster is
-low: both node failures and cluster flexing happen at the rate of every few
+<p>After an Application Instance has been started, the rate of change of the 
application is
+low: both node failures and flexing happen at the rate of every few
 hours, rather than every few seconds. This allows Slider to avoid needing
 data structures and layout persistence code designed for regular and repeated 
changes.</p>
 </li>
 <li>
 <p>Instance placement is best-effort: if the previous placement cannot be 
satisfied,
-the application will still perform adequately with role instances deployed
+the application will still perform adequately with component instances deployed
 onto new servers. More specifically, if a previous server is unavailable
-for hosting a role instance due to lack of capacity or availability, Slider
+for hosting a component instance due to lack of capacity or availability, 
Slider
 will not decrement the number of instances to deploy: instead it will rely
 on YARN to locate a new node -ideally on the same rack.</p>
 </li>
 <li>
-<p>If two instances of the same role do get assigned to the same server, it
+<p>If two instances of the same component do get assigned to the same server, 
it
 is not a failure condition. (This may be problematic for some roles 
--we may need a role-by-role policy here, so that master nodes can be 
anti-affine)
+-we may need a component-by-component policy here, so that master nodes can be 
anti-affine)
 [specifically, &gt;1 HBase master mode will not come up on the same host]</p>
 </li>
 <li>
-<p>If a role instance fails on a specific node, asking for a container on
+<p>If a component instance fails on a specific node, asking for a container on
 that same node for the replacement instance is a valid recovery strategy.
 This contains assumptions about failure modes -some randomness here may
 be a valid tactic, especially for roles that do not care about locality.</p>
 </li>
 <li>
 <p>Tracking failure statistics of nodes may be a feature to add in future;
-designing the Role History datastructures to enable future collection
+designing the RoleHistory datastructures to enable future collection
 of rolling statistics on recent failures would be a first step to this </p>
 </li>
 </ol>
-<h3 id="the-role-history">The Role History</h3>
-<p>The <code>RoleHistory</code> is a datastructure which models the role 
assignment, and
+<h3 id="the-rolehistory-datastructure">The RoleHistory Datastructure</h3>
+<p>The <code>RoleHistory</code> is a datastructure which models the component 
assignment, and
 can persist it to and restore it from the (shared) filesystem.</p>
 <ul>
 <li>
-<p>For each role, there is a list of cluster nodes which have supported this 
role
+<p>For each component, there is a list of YARN cluster nodes which have 
supported this component
 used in the past.</p>
 </li>
 <li>
-<p>This history is used when selecting a node for a role.</p>
+<p>This history is used when selecting a node for a component.</p>
 </li>
 <li>
 <p>This history remembers when nodes were allocated. These are re-requested
@@ -304,8 +338,8 @@ with YARN. This ensures that the same no
 due to outstanding requests.</p>
 </li>
 <li>
-<p>It does not retain a complete history of the role -and does not need to.
-All it needs to retain is the recent history for every node onto which a role
+<p>It does not retain a complete history of the component -and does not need 
to.
+All it needs to retain is the recent history for every node onto which a 
component
 instance has been deployed. Specifically, the last allocation or release
 operation on a node is all that needs to be persisted.</p>
 </li>
@@ -314,33 +348,33 @@ operation on a node is all that needs to
 as active -as they were from the previous instance.</p>
 </li>
 <li>
-<p>On AM restart, nodes in the role history marked as active have to be 
considered
-still active -the YARN RM will have to provide the full list of which are 
not.</p>
+<p>On AM restart, nodes in the history marked as active have to be considered
+still active —the YARN RM will have to provide the full list of which are 
not.</p>
 </li>
 <li>
-<p>During cluster flexing, nodes marked as released -and for which there is no
+<p>During flexing, nodes marked as released -and for which there is no
 outstanding request - are considered candidates for requesting new 
instances.</p>
 </li>
 <li>
-<p>When choosing a candidate node for hosting a role instance, it from the head
-of the time-ordered list of nodes that last ran an instance of that role</p>
+<p>When choosing a candidate node for hosting a component instance, it from 
the head
+of the time-ordered list of nodes that last ran an instance of that 
component</p>
 </li>
 </ul>
 <h3 id="persistence">Persistence</h3>
-<p>The state of the role is persisted to HDFS on changes -but not on cluster
+<p>The state of the component is persisted to HDFS on changes —but not on 
Application Instance
 termination.</p>
 <ol>
-<li>When nodes are allocated, the Role History is marked as dirty</li>
-<li>When container release callbacks are received, the Role History is marked 
as dirty</li>
-<li>When nodes are requested or a release request made, the Role History is 
<em>not</em>
+<li>When nodes are allocated, the RoleHistory is marked as dirty</li>
+<li>When container release callbacks are received, the RoleHistory is marked 
as dirty</li>
+<li>When nodes are requested or a release request made, the RoleHistory is 
<em>not</em>
  marked as dirty. This information is not relevant on AM restart.</li>
 </ol>
 <p>As at startup, a large number of allocations may arrive in a short period 
of time,
-the Role History may be updated very rapidly -yet as the containers are
+the RoleHistory may be updated very rapidly -yet as the containers are
 only recently activated, it is not likely that an immediately restarted Slider
-cluster would gain by re-requesting containers on them -their historical
+Application Instance would gain by re-requesting containers on them -their 
historical
 value is more important than their immediate past.</p>
-<p>Accordingly, the role history may be persisted to HDFS asynchronously, with
+<p>Accordingly, the RoleHistory may be persisted to HDFS asynchronously, with
 the dirty bit triggering an flushing of the state to HDFS. The datastructure
 will still need to be synchronized for cross thread access, but the 
 sync operation will not be a major deadlock, compared to saving the file on 
every
@@ -355,7 +389,7 @@ with the data saved in JSON or compresse
 instances on this node; there is no blacklisting. As a central blacklist
 for YARN has been proposed, it is hoped that this issue will be addressed 
centrally,
 without Slider having to remember which nodes are unreliable <em>for that 
particular
-Slider cluster</em>.</p>
+Application Instance</em>.</p>
 <p><strong>Anti-affinity</strong>: If multiple role instances are assigned to 
the same node,
 Slider has to choose on restart or flexing whether to ask for multiple
 nodes on that node again, or to pick other nodes. The assumed policy is
@@ -369,13 +403,13 @@ history of a node -maintaining some kind
 node use and picking the heaviest used, or some other more-complex algorithm.
 This may be possible, but we'd need evidence that the problem existed before
 trying to address it.</p>
-<h1 id="the-nodemap-the-core-of-the-role-history">The NodeMap: the core of the 
Role History</h1>
-<p>The core data structure, the <code>NodeMap</code> is a map of every known 
node in the cluster, tracking
+<h1 id="the-nodemap">The NodeMap</h1>
+<p>A core data structure, the <code>NodeMap</code>, is a map of every known 
node in the YARN cluster, tracking
 how many containers are allocated to specific roles in it, and, when there
 are no active instances, when it was last used. This history is used to
 choose where to request new containers. Because of the asynchronous
-allocation and release of containers, the Role History also needs to track
-outstanding release requests --and, more critically, outstanding allocation
+allocation and release of containers, the History also needs to track
+outstanding release requests —and, more critically, outstanding allocation
 requests. If Slider has already requested a container for a specific role
 on a host, then asking for another container of that role would break
 anti-affinity requirements. Note that not tracking outstanding requests would
@@ -397,7 +431,7 @@ let YARN choose.</li>
 <li>Handles the multi-container on one node problem</li>
 <li>By storing details about every role, cross-role decisions could be 
possible</li>
 <li>Simple counters can track the state of pending add/release requests</li>
-<li>Scales well to a rapidly flexing cluster</li>
+<li>Scales well to a rapidly flexing Application Instance</li>
 <li>Simple to work with and persist</li>
 <li>Easy to view and debug</li>
 <li>Would support cross-role collection of node failures in future</li>
@@ -413,7 +447,7 @@ of recently explicitly released nodes ca
 <li>Need to track outstanding requests against nodes, so that if a request
 was satisfied on a different node, the original node's request count is
  decremented, <em>not that of the node actually allocated</em>. </li>
-<li>In a virtual cluster, may fill with node entries that are no longer in the 
cluster.
+<li>In a virtual YARN cluster, may fill with node entries that are no longer 
in the cluster.
 Slider should query the RM (or topology scripts?) to determine if nodes are 
still
 parts of the YARN cluster. </li>
 </ul>
@@ -439,7 +473,7 @@ parts of the YARN cluster. </li>
 
 <p>Maps a YARN NodeID record to a Slider <code>NodeInstance</code> 
structure</p>
 <h3 id="nodeinstance">NodeInstance</h3>
-<p>Every node in the cluster is modeled as an ragged array of 
<code>NodeEntry</code> instances, indexed
+<p>Every node in the YARN cluster is modeled as an ragged array of 
<code>NodeEntry</code> instances, indexed
 by role index -</p>
 <div class="codehilite"><pre><span class="n">NodeEntry</span><span 
class="p">[</span><span class="n">roles</span><span class="p">]</span>
 <span class="n">get</span><span class="p">(</span><span 
class="n">roleId</span><span class="p">):</span> <span 
class="n">NodeEntry</span> <span class="n">or</span> <span class="n">null</span>
@@ -488,7 +522,7 @@ so that the selection of containers to r
 to index the specific role in a container so as to determine which role
 has been offered in a container allocation message, and which role has
 been released on a release event.</p>
-<p>The Role History needs to track outstanding requests, so that
+<p>The History needs to track outstanding requests, so that
 when an allocation comes in, it can be mapped back to the original request.
 Simply looking up the nodes on the provided container and decrementing
 its request counter is not going to work -the container may be allocated
@@ -498,7 +532,7 @@ on a different node from that requested.
 rolling integer -Slider will assume that after 2^24 requests per role, it can 
be rolled,
 -though as we will be retaining a list of outstanding requests, a clash should 
not occur.
 The main requirement  is: not have &gt; 2^24 outstanding requests for 
instances of a specific role,
-which places an upper bound on the size of a Slider cluster.</p>
+which places an upper bound on the size of the Application Instance.</p>
 <p>The splitting and merging will be implemented in a ContainerPriority class,
 for uniform access.</p>
 <h3 id="outstandingrequest">OutstandingRequest</h3>
@@ -549,7 +583,7 @@ to 1 Sort + M list lookups.</p>
 list of all Nodes which are available for an instance of that role, 
 using a comparator that places the most recently released node ahead of older
 nodes.</p>
-<p>This list is not persisted -when a Slider Cluster is stopped it is moot, 
and when
+<p>This list is not persisted —when the Application Instance is stopped it 
is moot, and when
 an AM is restarted this structure will be rebuilt.</p>
 <ol>
 <li>When a node is needed for a new request, this list is consulted first.</li>
@@ -560,22 +594,22 @@ of the list for that role.</li>
 </ol>
 <p>If the list is empty during a container request operation, it means
 that the Role History does not know of any nodes
-in the cluster that have hosted instances of that role and which are not
+in the YARN cluster that have hosted instances of that role and which are not
 in use. There are then two possible strategies to select a role</p>
 <ol>
-<li>Ask for an instance anywhere in the cluster (policy in Slider 0.5)</li>
+<li>Ask for an instance anywhere in the YARN cluster (policy in Slider 
0.5)</li>
 <li>Search the node map to identify other nodes which are (now) known about,
-but which are not hosting instances of a specific role -this can be used
+but which are not hosting instances of a specific role —this can be used
 as the target for the next resource request.</li>
 </ol>
-<p>Strategy #1 is simpler; Strategy #2 <em>may</em> decrease the affinity in 
the cluster,
+<p>Strategy #1 is simpler; Strategy #2 <em>may</em> decrease the affinity of 
component placement,
 as the AM will be explicitly requesting an instance on a node which it knows
 is not running an instance of that role.</p>
 <h4 id="issue-what-to-do-about-failing-nodes">ISSUE What to do about failing 
nodes?</h4>
 <p>Should a node whose container just failed be placed at the
 top of the stack, ready for the next request? </p>
 <p>If the container failed due to an unexpected crash in the application, 
asking
-for that container back <em>is the absolute right strategy</em> -it will bring
+for that container back <em>is the absolute right strategy</em> —it will 
bring
 back a new role instance on that machine. </p>
 <p>If the container failed because the node is now offline, the container 
request 
 will not be satisfied by that node.</p>
@@ -585,10 +619,10 @@ then re-requesting containers on it will
 <h3 id="bootstrap">Bootstrap</h3>
 <p>Persistent Role History file not found; empty data structures created.</p>
 <h3 id="restart">Restart</h3>
-<p>When starting a cluster, the Role History should be loaded. </p>
+<p>When starting an Application Instance, the Role History should be loaded. 
</p>
 <p>If the history is missing <em>or cannot be loaded for any reason</em>,
 Slider must revert to the bootstrap actions.</p>
-<p>If found, the Role History will contain Slider's view of the Slider 
Cluster's
+<p>If found, the Role History will contain Slider's view of the Application 
Instance's
 state at the time the history was saved, explicitly recording the last-used
 time of all nodes no longer hosting a role's container. By noting which roles
 were actually being served, it implicitly notes which nodes have a 
<code>last_used</code>
@@ -597,9 +631,9 @@ all node entries listed as having active
 saved must have more recent data than those nodes listed as inactive.</p>
 <p>When rebuilding the data structures, the fact that nodes were active at
 save time must be converted into the data that indicates that the nodes
-were at least in use <em>at the time the data was saved</em>. The state of the 
cluster
-after the last save is unknown.</p>
-<p>1: Role History loaded; Failure =&gt; Bootstrap.
+were at least in use <em>at the time the data was saved</em>.
+The state of the Application Instance after the last save is unknown.</p>
+<p>1: History loaded; Failure =&gt; Bootstrap.
 2: Future: if role list enum != current enum, remapping could take place. 
Until then: fail.
 3: Mark all nodes as active at save time to that of the</p>
 <div class="codehilite"><pre><span class="c1">//define a threshold</span>
@@ -645,26 +679,21 @@ from the list of available nodes</p>
 </pre></div>
 
 
-<p>There's no need to resort the available node list -all that has happened
+<p>There's no need to resort the available node list —all that has happened
 is that some entries have been removed</p>
-<p><strong>Issue</strong>: what if requests come in for a <code>(role, 
requestID)</code> for
-the previous instance of the AM? Could we just always set the initial
-requestId counter to a random number and hope the collision rate is very, very 
-low (2^24 * #(outstanding_requests)). If YARN-1041 ensures that
-a restarted AM does not receive outstanding requests, this issue goes away.</p>
 <h3 id="teardown">Teardown</h3>
 <ol>
 <li>If dirty, save role history to its file.</li>
 <li>Issue release requests</li>
-<li>Maybe update data structures on responses, but do not mark Role History
+<li>Maybe update data structures on responses, but do not mark RoleHistory
 as dirty or flush it to disk.</li>
 </ol>
 <p>This strategy is designed to eliminate the expectation that there will ever
-be a clean shutdown -and so that the startup-time code should expect
-the Role History to have been written during shutdown. Instead the code
+be a clean shutdown —and so that the startup-time code should expect
+the RoleHistory data to have been written during shutdown. Instead the code
 should assume that the history was saved to disk at some point during the life
-of the Slider Cluster -ideally after the most recent change, and that the 
information
-in it is only an approximate about what the previous state of the cluster 
was.</p>
+of the Application Instance —ideally after the most recent change, and that 
the information
+in it is only an approximate about what the previous state of the Application 
Instance was.</p>
 <h3 id="flex-requesting-a-container-in-role-role">Flex: Requesting a container 
in role <code>role</code></h3>
 <div class="codehilite"><pre><span class="n">node</span> <span 
class="p">=</span> <span class="n">availableNodes</span><span 
class="p">[</span><span class="n">roleId</span><span class="p">].</span><span 
class="n">pop</span><span class="p">()</span> 
 <span class="k">if</span> <span class="n">node</span> !<span 
class="p">=</span> <span class="n">null</span> <span class="p">:</span>
@@ -679,7 +708,7 @@ in it is only an approximate about what
 
 
 <p>There is a bias here towards previous nodes, even if the number of nodes
-in the cluster has changed. This is why a node is picked where the number
+in the Application Instance has changed. This is why a node is picked where 
the number
 of <code>active-releasing == 0 and requested == 0</code>, rather than where it 
is simply the lowest
 value of <code>active + requested - releasing</code>: if there is no node in 
the nodemap that
 is not running an instance of that role, it is left to the RM to decide where
@@ -687,11 +716,11 @@ the role instance should be instantiated
 <p>This bias towards previously used nodes also means that (lax) requests
 will be made of nodes that are currently unavailable either because they
 are offline or simply overloaded with other work. In such circumstances,
-the node will have an active count of zero -so the search will find these
-nodes and request them -even though the requests cannot be satisfied.
+the node will have an active count of zero —so the search will find these
+nodes and request them —even though the requests cannot be satisfied.
 As a result, the request will be downgraded to a rack-local or cluster-wide,
-request -an acceptable degradation on a cluster where all the other entries
-in the nodemap have instances of that specific node -but not when there are
+request —an acceptable degradation on an Application Instance where all the 
other entries
+in the nodemap have instances of that specific node —but not when there are
 empty nodes. </p>
 <h4 id="solutions">Solutions</h4>
 <ol>
@@ -701,7 +730,7 @@ iterate through the values. This would p
 node from being requested first.</p>
 </li>
 <li>
-<p>Keep track of requests, perhaps through a last-requested counter -and use
+<p>Keep track of requests, perhaps through a last-requested counter —and use
 this in the selection process. This would radically complicate the selection
 algorithm, and would not even distinguish "node recently released that was
 also the last requested" from "node that has not recently satisfied requests
@@ -718,13 +747,13 @@ is used in place of a recent one and the
 <p>But there are consequences:</p>
 <p><strong>Performance</strong>:</p>
 <p>Using the history to pick a recent node may increase selection times on a
-large cluster, as for every instance needed, a scan of all nodes in the
+large Application Instance, as for every instance needed, a scan of all nodes 
in the
 nodemap is required (unless there is some clever bulk assignment list being 
built
 up), or a sorted version of the nodemap is maintained, with a node placed
 at the front of this list whenever its is updated.</p>
 <p><strong>Startup-time problems</strong></p>
 <p>There is also the risk that while starting an applicatin instance, the 
<code>rolehistory.saved</code>
-flag may be updated while the cluster flex is in progress, so making the saved
+flag may be updated while the flex is in progress, so making the saved
 nodes appear out of date. Perhaps the list of recently released nodes could
 be rebuilt at startup time.</p>
 <p>The proposed <code>recentlyReleasedList</code> addresses this, though it 
creates
@@ -737,7 +766,7 @@ from the last-used fields in the node en
 
 <p>This is the callback received when containers have been allocated.
 Due to (apparently) race conditions, the AM may receive duplicate
-container allocations -Slider already has to recognize this and 
+container allocations —Slider already has to recognize this and 
 currently simply discards any surplus.</p>
 <p>If the AM tracks outstanding requests made for specific hosts, it
 will need to correlate allocations with the original requests, so as to 
decrement
@@ -799,7 +828,7 @@ other sync problem.</p>
 <li>
 <p>The node selected for the original request has its request for a role 
instance
 decremented, so that it may be viewed as available again. The node is also
-re-inserted into the AvailableNodes list -not at its head, but at its position
+re-inserted into the AvailableNodes list —not at its head, but at its 
position
 in the total ordering of the list.</p>
 </li>
 </ol>
@@ -814,11 +843,11 @@ in the associated <code>RoleInstance</co
 adjusted to indicate it has one more live node and one less starting node.</p>
 <h3 id="nmclientasync-callback-oncontainerstartfailed">NMClientAsync Callback: 
 onContainerStartFailed()</h3>
 <p>The AM uses this as a signal to remove the container from the list
-of starting containers -the count of starting containers for the relevant
+of starting containers —the count of starting containers for the relevant
 NodeEntry is decremented. If the node is now available for instances of this
 container, it is returned to the queue of available nodes.</p>
-<h3 id="flex-releasing-a-role-instance-from-the-cluster">Flex: Releasing a  
role instance from the cluster</h3>
-<p>Simple strategy: find a node with at least one active container</p>
+<h3 
id="flex-releasing-a-component-instance-from-the-application-instance">Flex: 
Releasing a Component instance from the Application Instance</h3>
+<p>Simple strategy: find a node with at least one active container of that 
type</p>
 <div class="codehilite"><pre><span class="n">select</span> <span 
class="n">a</span> <span class="n">node</span> <span class="n">N</span> <span 
class="n">in</span> <span class="n">nodemap</span> <span class="n">where</span> 
<span class="k">for</span> <span class="n">NodeEntry</span><span 
class="p">[</span><span class="n">roleId</span><span class="p">]:</span> <span 
class="n">active</span> <span class="o">&gt;</span> <span 
class="n">releasing</span><span class="p">;</span> 
 <span class="n">nodeentry</span> <span class="p">=</span> <span 
class="n">node</span><span class="p">.</span><span class="n">get</span><span 
class="p">(</span><span class="n">roleId</span><span class="p">)</span>
 <span class="n">nodeentry</span><span class="p">.</span><span 
class="n">active</span><span class="o">--</span><span class="p">;</span>
@@ -888,7 +917,7 @@ has completed although it wasn't on the
 
 
 <p>By calling <code>reviewRequestAndReleaseNodes()</code> the AM triggers
-a re-evaluation of how many instances of each node a cluster has, and how many
+a re-evaluation of how many instances of each node an Application Instance 
has, and how many
 it needs. If a container has failed and that freed up all role instances
 on that node, it will have been inserted at the front of the 
<code>availableNodes</code> list.
 As a result, it is highly likely that a new container will be requested on 
@@ -898,10 +927,10 @@ be if other containers were completed in
 <p>Notes made while implementing the design.</p>
 <p><code>OutstandingRequestTracker</code> should also track requests made with
 no target node; this makes seeing what is going on easier. 
<code>ARMClientImpl</code>
-is doing something similar, on a priority-by-priority basis -if many
+is doing something similar, on a priority-by-priority basis —if many
 requests are made, each with their own priority, that base class's hash tables
 may get overloaded. (it assumes a limited set of priorities)</p>
-<p>Access to the role history datastructures was restricted to avoid
+<p>Access to the RoleHistory datastructures was restricted to avoid
 synchronization problems. Protected access is permitted so that a
 test subclass can examine (and change?) the internals.</p>
 <p>`NodeEntries need to add a launching value separate from active so that
@@ -909,7 +938,7 @@ when looking for nodes to release, no at
 a node that has been allocated but is not yet live.</p>
 <p>We can't reliably map from a request to a response. Does that matter?
 If we issue a request for a host and it comes in on a different port, do we
-care? Yes -but only because we are trying to track nodes which have requests
+care? Yes —but only because we are trying to track nodes which have requests
 outstanding so as not to issue new ones. But if we just pop the entry
 off the available list, that becomes moot.</p>
 <p>Proposal: don't track the requesting numbers in the node entries, just
@@ -918,7 +947,7 @@ in the role status fields.</p>
 node on them was requested but not satisfied.</p>
 <p>Other issues: should we place nodes on the available list as soon as all 
the entries
 have been released?  I.e. Before YARN has replied</p>
-<p>RoleStats were removed -left in app state. Although the rolestats would
+<p>RoleStats were removed —left in app state. Although the rolestats would
 belong here, leaving them where they were reduced the amount of change
 in the <code>AppState</code> class, so risk of something breaking.</p>
 <h2 id="miniyarncluster-node-ids">MiniYARNCluster node IDs</h2>
@@ -927,14 +956,14 @@ against file://; so mini tests with &gt;
 <code>NodeId:NodeInstance</code>. What will happen is that 
 <code>NodeInstance getOrCreateNodeInstance(Container container) '
 will always return the same (now shared)</code>NodeInstance`.</p>
-<h2 id="releasing-containers-when-shrinking-a-cluster">Releasing Containers 
when shrinking a cluster</h2>
+<h2 id="releasing-containers-when-shrinking-an-application-instance">Releasing 
Containers when shrinking an Application Instance</h2>
 <p>When identifying instances to release in a bulk downscale operation, the 
full
 list of targets must be identified together. This is not just to eliminate
 multiple scans of the data structures, but because the containers are not
-released until the queued list of actions are executed -the nodes' 
release-in-progress
+released until the queued list of actions are executed —the nodes' 
release-in-progress
 counters will not be incremented until after all the targets have been 
identified.</p>
 <p>It also needs to handle the scenario where there are many role instances on 
a
-single server -it should prioritize those. </p>
+single server —it should prioritize those. </p>
 <p>The NodeMap/NodeInstance/NodeEntry structure is adequate for identifying 
nodes,
 at least provided there is a 1:1 mapping of hostname to NodeInstance. But it
 is not enough to track containers in need of release: the AppState needs
@@ -950,24 +979,24 @@ in the container</p>
 role should be dropped from the list.</p>
 <p>This can happen when an instance was allocated on a different node from
 that requested.</p>
-<h3 
id="finding-a-node-when-a-role-has-instances-in-the-cluster-but-nothing">Finding
 a node when a role has instances in the cluster but nothing</h3>
-<p>known to be available</p>
+<h3 
id="finding-a-node-when-a-component-has-some-live-instances-and-no-available-hosts-known-to">Finding
 a node when a component has some live instances and no available hosts known 
to</h3>
+<p>Slider.</p>
 <p>One condition found during testing is the following: </p>
 <ol>
-<li>A role has one or more instances running in the cluster</li>
+<li>A role has one or more instances running in the YARN cluster</li>
 <li>A role has no entries in its available list: there is no history of the 
 role ever being on nodes other than which is currently in use.</li>
 <li>A new instance is requested.</li>
 </ol>
 <p>In this situation, the <code>findNodeForNewInstance</code> method returns 
null: there
 is no recommended location for placement. However, this is untrue: all
-nodes in the cluster <code>other</code> than those in use are the recommended 
nodes. </p>
-<p>It would be possible to build up a list of all known nodes in the cluster 
that
+nodes in the YARN cluster <code>other</code> than those in use are the 
recommended nodes. </p>
+<p>It would be possible to build up a list of all known nodes in the YARN 
cluster that
 are not running this role and use that in the request, effectively telling the
 AM to pick one of the idle nodes. By not doing so, we increase the probability
 that another instance of the same role will be allocated on a node in use,
 a probability which (were there capacity on these nodes and placement random), 
be
-<code>1/(clustersize-roleinstances)</code>. The smaller the cluster and the 
bigger the
+<code>1/(clustersize-roleinstances)</code>. The smaller the YARN cluster and 
the bigger the
 application, the higher the risk.</p>
 <p>This could be revisited, if YARN does not support anti-affinity between new
 requests at a given priority and existing ones: the solution would be to
@@ -995,17 +1024,17 @@ the ordering of that list in most-recent
 the set of outstanding requests will be retained, so all these hosts in the
 will be considered unavailable for new location-specific requests.
 This may imply that new requests that could be explicity placed will now only
-be randomly placed -however, it is moot on the basis that if there are 
outstanding
+be randomly placed —however, it is moot on the basis that if there are 
outstanding
 container requests it means the RM cannot grant resources: new requests at the
 same priority (i.e. same Slider Role ID) will not be granted either.</p>
 <p>The only scenario where this would be different is if the resource 
requirements
-of instances of the target role were decreated during a cluster flex such that
+of instances of the target role were decreated during a flex such that
 the placement could now be satisfied on the target host. This is not considered
 a significant problem.</p>
 <h1 id="persistence_1">Persistence</h1>
 <p>The initial implementation uses the JSON-formatted Avro format; while 
significantly
 less efficient than a binary format, it is human-readable</p>
-<p>Here are sequence of entries from a test run on a single node cluster; 
running 1 HBase Master
+<p>Here are sequence of entries from a test run on a single node YARN cluster; 
running 1 HBase Master
 and two region servers.</p>
 <p>Initial save; the instance of Role 1 (HBase master) is live, Role 2 (RS) is 
not.</p>
 <div class="codehilite"><pre><span class="p">{</span>&quot;<span 
class="n">entry</span>&quot;<span class="p">:{</span>&quot;<span 
class="n">org</span><span class="p">.</span><span class="n">apache</span><span 
class="p">.</span><span class="n">hoya</span><span class="p">.</span><span 
class="n">avro</span><span class="p">.</span><span 
class="n">RoleHistoryHeader</span>&quot;<span class="p">:{</span>&quot;<span 
class="n">version</span>&quot;<span class="p">:</span>1<span 
class="p">,</span>&quot;<span class="n">saved</span>&quot;<span 
class="p">:</span>1384183475949<span class="p">,</span>&quot;<span 
class="n">savedx</span>&quot;<span class="p">:</span>&quot;14247<span 
class="n">c3aeed</span>&quot;<span class="p">,</span>&quot;<span 
class="n">roles</span>&quot;<span class="p">:</span>3<span class="p">}}}</span>
@@ -1028,8 +1057,8 @@ and two region servers.</p>
 </pre></div>
 
 
-<p>At this point the cluster was stopped and started.</p>
-<p>When the cluster is restarted, every node that was active for a role at the 
time the file was saved <code>1384183476028</code>
+<p>At this point the Application Instance was stopped and started.</p>
+<p>When the Application Instance is restarted, every node that was active for 
a role at the time the file was saved <code>1384183476028</code>
 is given a last_used timestamp of that time. </p>
 <p>When the history is next saved, the master has come back onto the (single) 
node,
 it is active while its <code>last_used</code> timestamp is the previous file's 
timestamp.
@@ -1054,7 +1083,7 @@ No region servers are yet live.</p>
 </pre></div>
 
 
-<p>The <code>last_used</code> timestamps will not be changed until the cluster 
is shrunk or restarted, as the <code>active</code> flag being set
+<p>The <code>last_used</code> timestamps will not be changed until the 
Application Instance is shrunk or restarted, as the <code>active</code> flag 
being set
 implies that the server is running both roles at the save time of 
<code>1384183512217</code>.</p>
 <h2 id="resolved-issues">Resolved issues</h2>
 <blockquote>


Reply via email to