Re: A basic approach towards implementing the ANTI-AFFINITY in Slider

Steve Loughran Tue, 19 May 2015 12:35:16 -0700

> On 18 May 2015, at 23:41, Rajesh Kartha <[email protected]> wrote:
> 
> Hello,
> 
> One of the early requests we got for improving Slider was to have a
> way of *ensuring
> only a single process* of the given application runs on any given node. I
> have read about the ANTI-AFFINITY flag but was not fully sure about its
> implementation.
> 
> Hence have been  trying to piece things together based on the comments in:
> 
> - SLIDER-82
> - Steve's blog at
> http://steveloughran.blogspot.co.uk/2015/05/dynamic-datacentre-applications.html
> - The Slider wiki at -
> http://slider.incubator.apache.org/design/rolehistory.html
> - Looking at the code
> 
> Today the flag PlacementPolicy.ANTI_AFFINITY_REQUIRED seems like a place
> holder and is not being used currently  in the flow.


I think it's used on restart, where explicit requests for nodes on hosts are 
not made if there's
already an instance on that node and the anti-affinity flag is set

> 
> Also, as I understand, the main method where the check on containers happen
> is in the Event Handler:
> 
> *AppState.onContainersAllocated()*
> 
> Since this method makes the decision on the allocated containers before
> launching the role, I was thinking of a simple approach where we could :
> 
> 1) check for the RoleStatus.isAntiAffinePlacement() to be true
> 2) check if the NodeInstance on which the current container is allocated to
> be either  in the RoleHistory.listActiveNodes(roleId) or found to be
> unreliable
> 3) discard the container without decrementing the request count for the role
> 4) if the container check does not meet the check in #2 then proceed with
> the flow
> and continue with the launch
> 
> The launching of the role via the launchService happens after this check,
> so I would hope these checks may not be that expensive.
> 

I thought about this, but wasn't happy with it.

We will end up discarding a lot of containers. These may seem simple idle 
capacity, but in a cluster with pre-emption enabled a slider app could be 
killing  other work to get containers it then discards.

Even without that, theres a risk that you end up getting back those same hosts, 
again, and again. Same for unreliable hosts


> One other potential area for such a check is
> RoleHostory.findNodeForNewInstance(role) during
> the iteration of the list of Node Instances from the getNodesForRoleId(),
> but based on my  experiments the listofActiveNodes() and the
> getNodesForRoleId() seemed mutually exclusive, hence this check may not be
> needed there.
> 
> Again, not sure if the above can address the different scenarios that is
> expected from the ANTI-AFFINITY flag, but was wondering if this was
> feasible as a first approach to having some ANTI-AFFINITY support.


I think it's a step in the right direction, but we really need to make the leap 
to doing what twill did and use the blacklist to exclude nodes where we either 
have active containers or their reliability is considered too low

I'm not planning to do any work in that area in the near future -so if you want 
to sit down and start doing it, feel free!

Re: A basic approach towards implementing the ANTI-AFFINITY in Slider

Reply via email to