I see a host locality available as an attribute in DAG for individual operators. If affinity doesn't support this today, we could probably add it. You could also make setting a blacklist directly a convenience function on top of affinity.
On Thu, Dec 1, 2016 at 11:58 AM, Sandesh Hegde <sand...@datatorrent.com> wrote: > Pramod, > > How to specify, "don't deploy any operators on Node20" using > anti-affinity? > > I don't see any examples here, > http://apex.apache.org/docs/apex/application_development/#affinity-rules > > > On Thu, Dec 1, 2016 at 11:31 AM Pramod Immaneni <pra...@datatorrent.com> > wrote: > > > Shouldn't this be already covered by anti-affinity. Today users can > specify > > multiple affinity rules, for each rule they can specify positive or > > negative affinity, locality and operator selection. If an affinity rule > > specifying negative affinity, node locality and all operators, does not > > work then let's fix that scenario instead of creating a new option. > > > > On Thu, Dec 1, 2016 at 11:17 AM, Sandesh Hegde <sand...@datatorrent.com> > > wrote: > > > > > I have created a jira, for adding the list of blacklisted nodes, > > > https://issues.apache.org/jira/browse/APEXCORE-584 > > > > > > On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare <san...@datatorrent.com > > > > > wrote: > > > > > > > Yes, Ram explained to me that in practice this would be a useful > > feature > > > > for Apex devops who typically have no control over Hadoop/Yarn > cluster. > > > > > > > > On 11/30/16, 9:22 PM, "Mohit Jotwani" <mo...@datatorrent.com> wrote: > > > > > > > > This is a practical scenario where developers would be required > to > > > > exclude > > > > certain nodes as they might be required for some mission critical > > > > applications. It would be good to have this feature. > > > > > > > > I understand that Stram should not get into resourcing and still > > rely > > > > on > > > > Yarn, however, as the App Master it should have the right to > reject > > > the > > > > nodes offered by Yarn and request for other resources. > > > > > > > > Regards, > > > > Mohit > > > > > > > > On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde < > > > sand...@datatorrent.com > > > > > > > > > wrote: > > > > > > > > > Apex has automatic blacklisting of the troublesome nodes, > please > > > > take a > > > > > look at the following attributes, > > > > > > > > > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST > > > > > https://www.datatorrent.com/docs/apidocs/com/datatorrent/ > > > > > api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_ > > > > > FAILURES_FOR_BLACKLIST > > > > > > > > > > BLACKLISTED_NODE_REMOVAL_TIME_MILLIS > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath < > > > > r...@datatorrent.com> > > > > > wrote: > > > > > > > > > > Not sure if this is what Milind had in mind but we often run > into > > > > > situations where the dev group > > > > > working with Apex has no control over cluster configuration -- > to > > > > make any > > > > > changes to the cluster they need to > > > > > go through an elaborate process that can take many days. > > > > > > > > > > Meanwhile, if they notice that a particular node is > consistently > > > > causing > > > > > problems for their > > > > > app, having a simple way to exclude it would be very helpful > > since > > > > it gives > > > > > them a way > > > > > to bypass communication and process issues within their own > > > > organization. > > > > > > > > > > Ram > > > > > > > > > > On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare < > > > > san...@datatorrent.com> > > > > > wrote: > > > > > > > > > > > To me both use cases appear to be generic resource management > > use > > > > cases. > > > > > > For example, a randomly rebooting node is not good for any > > > purpose > > > > esp. > > > > > > long running apps so it is a bit of a stretch to imagine that > > > > these nodes > > > > > > will be acceptable for some batch jobs in Yarn. So such a > node > > > > should be > > > > > > marked “Bad” or Unavailable in Yarn itself. > > > > > > > > > > > > Second use case is also typical anti-affinity use case which > > > > ideally > > > > > > should be implemented in Yarn – Milind’s example can also > apply > > > to > > > > > non-Apex > > > > > > batch jobs. In any case it looks like Yarn still doesn’t have > > it > > > ( > > > > > > https://issues.apache.org/jira/browse/YARN-1042) so if Apex > > > needs > > > > it we > > > > > > will need to do it ourselves. > > > > > > > > > > > > On 11/30/16, 10:39 AM, "Munagala Ramanath" < > > r...@datatorrent.com> > > > > wrote: > > > > > > > > > > > > But then, what's the solution to the 2 problem scenarios > > that > > > > Milind > > > > > > describes ? > > > > > > > > > > > > Ram > > > > > > > > > > > > On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare < > > > > > > san...@datatorrent.com> > > > > > > wrote: > > > > > > > > > > > > > I think “exclude nodes” and such is really the job of > the > > > > resource > > > > > > manager > > > > > > > i.e. Yarn. So I am not sure taking over some of these > > tasks > > > > in Apex > > > > > > would > > > > > > > be very useful. > > > > > > > > > > > > > > I agree with Amol that apps should be node neutral. > > > Resource > > > > > > management in > > > > > > > Yarn together with fault tolerance in Apex should > > minimize > > > > the need > > > > > > for > > > > > > > this feature although I am sure one can find use cases. > > > > > > > > > > > > > > > > > > > > > On 11/29/16, 10:41 PM, "Amol Kekre" < > > a...@datatorrent.com> > > > > wrote: > > > > > > > > > > > > > > We do have this feature in Yarn, but that applies > to > > > all > > > > > > applications. > > > > > > > I am > > > > > > > not sure if Yarn has anti-affinity. This feature > may > > be > > > > used, > > > > > > but in > > > > > > > general there is danger is an application taking > over > > > > resource > > > > > > > allocation. > > > > > > > Another quirk is that big data apps should ideally > be > > > > > > node-neutral. > > > > > > > This is > > > > > > > a good idea, if we are able to carve out something > > > where > > > > need > > > > > is > > > > > > app > > > > > > > specific. > > > > > > > > > > > > > > Thks > > > > > > > Amol > > > > > > > > > > > > > > > > > > > > > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve < > > > > > > mili...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > We have seen 2 cases mentioned below, where, it > > would > > > > have > > > > > > been nice > > > > > > > if > > > > > > > > Apex allowed us to exclude a node from the > cluster > > > for > > > > an > > > > > > > application. > > > > > > > > > > > > > > > > 1. A node in the cluster had gone bad (was > randomly > > > > > rebooting) > > > > > > and > > > > > > > so an > > > > > > > > Apex app should not use it - other apps can use > it > > as > > > > they > > > > > were > > > > > > > batch jobs. > > > > > > > > 2. A node is being used for a mission critical > app > > > > (Could be > > > > > > an Apex > > > > > > > app > > > > > > > > itself), but another Apex app which is mission > > > critical > > > > > should > > > > > > not > > > > > > > be using > > > > > > > > resources on that node. > > > > > > > > > > > > > > > > Can we have a way in which, Stram and YARN can > > > > coordinate > > > > > > between > > > > > > > each > > > > > > > > other to not use a set of nodes for the > > application. > > > > It an be > > > > > > done > > > > > > > in 2 way > > > > > > > > s- > > > > > > > > > > > > > > > > 1. Have a list of "exclude" nodes with Stram- > when > > > YARN > > > > > > allcates > > > > > > > resources > > > > > > > > on either of these, STRAM rejects and gets > > resources > > > > > allocated > > > > > > again > > > > > > > frm > > > > > > > > YARN > > > > > > > > 2. Have a list of nodes that can be used for an > > app - > > > > This > > > > > can > > > > > > be a > > > > > > > part of > > > > > > > > config. Hwever, I don't think this would be a > right > > > > way to do > > > > > > so as > > > > > > > we will > > > > > > > > need support from YARN as well. Further, this > might > > > be > > > > > > difficult to > > > > > > > change > > > > > > > > at runtim if need be. > > > > > > > > > > > > > > > > Any thoughts? > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > ~Milind bee at gee mail dot com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >