> On 18 May 2015, at 23:41, Rajesh Kartha <[email protected]> wrote: > > Hello, > > One of the early requests we got for improving Slider was to have a > way of *ensuring > only a single process* of the given application runs on any given node. I > have read about the ANTI-AFFINITY flag but was not fully sure about its > implementation. > > Hence have been trying to piece things together based on the comments in: > > - SLIDER-82 > - Steve's blog at > http://steveloughran.blogspot.co.uk/2015/05/dynamic-datacentre-applications.html > - The Slider wiki at - > http://slider.incubator.apache.org/design/rolehistory.html > - Looking at the code > > Today the flag PlacementPolicy.ANTI_AFFINITY_REQUIRED seems like a place > holder and is not being used currently in the flow.
I think it's used on restart, where explicit requests for nodes on hosts are not made if there's already an instance on that node and the anti-affinity flag is set > > Also, as I understand, the main method where the check on containers happen > is in the Event Handler: > > *AppState.onContainersAllocated()* > > Since this method makes the decision on the allocated containers before > launching the role, I was thinking of a simple approach where we could : > > 1) check for the RoleStatus.isAntiAffinePlacement() to be true > 2) check if the NodeInstance on which the current container is allocated to > be either in the RoleHistory.listActiveNodes(roleId) or found to be > unreliable > 3) discard the container without decrementing the request count for the role > 4) if the container check does not meet the check in #2 then proceed with > the flow > and continue with the launch > > The launching of the role via the launchService happens after this check, > so I would hope these checks may not be that expensive. > I thought about this, but wasn't happy with it. We will end up discarding a lot of containers. These may seem simple idle capacity, but in a cluster with pre-emption enabled a slider app could be killing other work to get containers it then discards. Even without that, theres a risk that you end up getting back those same hosts, again, and again. Same for unreliable hosts > One other potential area for such a check is > RoleHostory.findNodeForNewInstance(role) during > the iteration of the list of Node Instances from the getNodesForRoleId(), > but based on my experiments the listofActiveNodes() and the > getNodesForRoleId() seemed mutually exclusive, hence this check may not be > needed there. > > Again, not sure if the above can address the different scenarios that is > expected from the ANTI-AFFINITY flag, but was wondering if this was > feasible as a first approach to having some ANTI-AFFINITY support. I think it's a step in the right direction, but we really need to make the leap to doing what twill did and use the blacklist to exclude nodes where we either have active containers or their reliability is considered too low I'm not planning to do any work in that area in the near future -so if you want to sit down and start doing it, feel free!
