Agree it should be via YARN; the poison pill would be the final barrier in the event all other mechanisms have failed -- sort of like an API call which documents that a parameter should be non-null but nevertheless checks it internally and throws an exception if it finds null.
Additionally, it also helps teams that do not have control over YARN configuration. Ram On Fri, Dec 2, 2016 at 7:15 AM, Amol Kekre <[email protected]> wrote: > Stram exclude node should be via Yarn, poison pill is not a good way as it > induces a terminate for wrong reasons. > > Thks > Amol > > > On Fri, Dec 2, 2016 at 7:13 AM, Munagala Ramanath <[email protected]> > wrote: > > > Could STRAM include a poison pill where it simply exits with diagnostic > if > > its host name is blacklisted ? > > > > Ram > > > > On Thu, Dec 1, 2016 at 11:52 PM, Amol Kekre <[email protected]> > wrote: > > > > > Yarn will deploy AM (Stram) on a node of its choice, therey rendering > any > > > attribute within the app un-enforceable in terms of not deploying > master > > on > > > a node. > > > > > > Thks > > > Amol > > > > > > > > > On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve <[email protected]> > wrote: > > > > > > > Additionally, this would apply to Stram as well i.e. the master > should > > > also > > > > not be deployed on these nodes. Not sure if anti-affinity goes beyond > > > > operators. > > > > > > > > On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve <[email protected]> > > wrote: > > > > > > > > > My previous mail explains it, but just forgot to add : -1 to cover > > this > > > > > under anti affinity. > > > > > > > > > > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve <[email protected]> > > > wrote: > > > > > > > > > >> While it is possible to extend anti-affinity to take care of > this, I > > > > feel > > > > >> it will cause confusion from a user perspective. As a user, when I > > > think > > > > >> about anti-affinity, what comes to mind right away is a relative > > > > relation > > > > >> between operators. > > > > >> > > > > >> On the other hand, the current ask is not that, but a relation at > an > > > > >> application level w.r.t. a node. (Further, we might even think of > > > > extending > > > > >> this at an operator level - which would mean do not deploy an > > operator > > > > on a > > > > >> particular node) > > > > >> > > > > >> We would be better off clearly articulating and allowing users to > > > > >> configure it seperately as against using anti-affinity. > > > > >> > > > > >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda < > > > > [email protected]> > > > > >> wrote: > > > > >> > > > > >>> Okay, I think that serves an alternate purpose of detecting any > > newly > > > > >>> gone > > > > >>> bad node and excluding it. > > > > >>> > > > > >>> +1 for covering the original scenario under anti-affinity. > > > > >>> > > > > >>> ~ Bhupesh > > > > >>> > > > > >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath < > > > [email protected] > > > > > > > > > >>> wrote: > > > > >>> > > > > >>> > It only takes effect after failures -- no way to exclude from > the > > > > >>> get-go. > > > > >>> > > > > > >>> > Ram > > > > >>> > > > > > >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" < > > [email protected]> > > > > >>> wrote: > > > > >>> > > > > > >>> > > As suggested by Sandesh, the parameter > > > > >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do > > > > exactly > > > > >>> > what > > > > >>> > > is needed. > > > > >>> > > Why would this not work? > > > > >>> > > > > > > >>> > > ~ Bhupesh > > > > >>> > > > > > > >>> > > > > > >>> > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> ~Milind bee at gee mail dot com > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > ~Milind bee at gee mail dot com > > > > > > > > > > > > > > > > > > > > > -- > > > > ~Milind bee at gee mail dot com > > > > > > > > > >
