Not sure if this is what Milind had in mind but we often run into situations where the dev group working with Apex has no control over cluster configuration -- to make any changes to the cluster they need to go through an elaborate process that can take many days.
Meanwhile, if they notice that a particular node is consistently causing problems for their app, having a simple way to exclude it would be very helpful since it gives them a way to bypass communication and process issues within their own organization. Ram On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <san...@datatorrent.com> wrote: > To me both use cases appear to be generic resource management use cases. > For example, a randomly rebooting node is not good for any purpose esp. > long running apps so it is a bit of a stretch to imagine that these nodes > will be acceptable for some batch jobs in Yarn. So such a node should be > marked “Bad” or Unavailable in Yarn itself. > > Second use case is also typical anti-affinity use case which ideally > should be implemented in Yarn – Milind’s example can also apply to non-Apex > batch jobs. In any case it looks like Yarn still doesn’t have it ( > https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we > will need to do it ourselves. > > On 11/30/16, 10:39 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote: > > But then, what's the solution to the 2 problem scenarios that Milind > describes ? > > Ram > > On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare < > san...@datatorrent.com> > wrote: > > > I think “exclude nodes” and such is really the job of the resource > manager > > i.e. Yarn. So I am not sure taking over some of these tasks in Apex > would > > be very useful. > > > > I agree with Amol that apps should be node neutral. Resource > management in > > Yarn together with fault tolerance in Apex should minimize the need > for > > this feature although I am sure one can find use cases. > > > > > > On 11/29/16, 10:41 PM, "Amol Kekre" <a...@datatorrent.com> wrote: > > > > We do have this feature in Yarn, but that applies to all > applications. > > I am > > not sure if Yarn has anti-affinity. This feature may be used, > but in > > general there is danger is an application taking over resource > > allocation. > > Another quirk is that big data apps should ideally be > node-neutral. > > This is > > a good idea, if we are able to carve out something where need is > app > > specific. > > > > Thks > > Amol > > > > > > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve < > mili...@gmail.com> > > wrote: > > > > > We have seen 2 cases mentioned below, where, it would have > been nice > > if > > > Apex allowed us to exclude a node from the cluster for an > > application. > > > > > > 1. A node in the cluster had gone bad (was randomly rebooting) > and > > so an > > > Apex app should not use it - other apps can use it as they were > > batch jobs. > > > 2. A node is being used for a mission critical app (Could be > an Apex > > app > > > itself), but another Apex app which is mission critical should > not > > be using > > > resources on that node. > > > > > > Can we have a way in which, Stram and YARN can coordinate > between > > each > > > other to not use a set of nodes for the application. It an be > done > > in 2 way > > > s- > > > > > > 1. Have a list of "exclude" nodes with Stram- when YARN > allcates > > resources > > > on either of these, STRAM rejects and gets resources allocated > again > > frm > > > YARN > > > 2. Have a list of nodes that can be used for an app - This can > be a > > part of > > > config. Hwever, I don't think this would be a right way to do > so as > > we will > > > need support from YARN as well. Further, this might be > difficult to > > change > > > at runtim if need be. > > > > > > Any thoughts? > > > > > > > > > -- > > > ~Milind bee at gee mail dot com > > > > > > > > > > > > > > >