This is a practical scenario where developers would be required to exclude
certain nodes as they might be required for some mission critical
applications. It would be good to have this feature.

I understand that Stram should not get into resourcing and still rely on
Yarn, however, as the App Master it should have the right to reject the
nodes offered by Yarn and request for other resources.

Regards,
Mohit

On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde <sand...@datatorrent.com>
wrote:

> Apex has automatic blacklisting of the troublesome nodes, please take a
> look at the following attributes,
>
> MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
> https://www.datatorrent.com/docs/apidocs/com/datatorrent/
> api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_
> FAILURES_FOR_BLACKLIST
>
> BLACKLISTED_NODE_REMOVAL_TIME_MILLIS
>
> Thanks
>
>
>
> On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
> Not sure if this is what Milind had in mind but we often run into
> situations where the dev group
> working with Apex has no control over cluster configuration -- to make any
> changes to the cluster they need to
> go through an elaborate process that can take many days.
>
> Meanwhile, if they notice that a particular node is consistently causing
> problems for their
> app, having a simple way to exclude it would be very helpful since it gives
> them a way
> to bypass communication and process issues within their own organization.
>
> Ram
>
> On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <san...@datatorrent.com>
> wrote:
>
> > To me both use cases appear to be generic resource management use cases.
> > For example, a randomly rebooting node is not good for any purpose esp.
> > long running apps so it is a bit of a stretch to imagine that these nodes
> > will be acceptable for some batch jobs in Yarn. So such a node should be
> > marked “Bad” or Unavailable in Yarn itself.
> >
> > Second use case is also typical anti-affinity use case which ideally
> > should be implemented in Yarn – Milind’s example can also apply to
> non-Apex
> > batch jobs. In any case it looks like Yarn still doesn’t have it (
> > https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> > will need to do it ourselves.
> >
> > On 11/30/16, 10:39 AM, "Munagala Ramanath" <r...@datatorrent.com> wrote:
> >
> >     But then, what's the solution to the 2 problem scenarios that Milind
> >     describes ?
> >
> >     Ram
> >
> >     On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> > san...@datatorrent.com>
> >     wrote:
> >
> >     > I think “exclude nodes” and such is really the job of the resource
> > manager
> >     > i.e. Yarn. So I am not sure taking over some of these tasks in Apex
> > would
> >     > be very useful.
> >     >
> >     > I agree with Amol that apps should be node neutral. Resource
> > management in
> >     > Yarn together with fault tolerance in Apex should minimize the need
> > for
> >     > this feature although I am sure one can find use cases.
> >     >
> >     >
> >     > On 11/29/16, 10:41 PM, "Amol Kekre" <a...@datatorrent.com> wrote:
> >     >
> >     >     We do have this feature in Yarn, but that applies to all
> > applications.
> >     > I am
> >     >     not sure if Yarn has anti-affinity. This feature may be used,
> > but in
> >     >     general there is danger is an application taking over resource
> >     > allocation.
> >     >     Another quirk is that big data apps should ideally be
> > node-neutral.
> >     > This is
> >     >     a good idea, if we are able to carve out something where need
> is
> > app
> >     >     specific.
> >     >
> >     >     Thks
> >     >     Amol
> >     >
> >     >
> >     >     On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> > mili...@gmail.com>
> >     > wrote:
> >     >
> >     >     > We have seen 2 cases mentioned below, where, it would have
> > been nice
> >     > if
> >     >     > Apex allowed us to exclude a node from the cluster for an
> >     > application.
> >     >     >
> >     >     > 1. A node in the cluster had gone bad (was randomly
> rebooting)
> > and
> >     > so an
> >     >     > Apex app should not use it - other apps can use it as they
> were
> >     > batch jobs.
> >     >     > 2. A node is being used for a mission critical app (Could be
> > an Apex
> >     > app
> >     >     > itself), but another Apex app which is mission critical
> should
> > not
> >     > be using
> >     >     > resources on that node.
> >     >     >
> >     >     > Can we have a way in which, Stram and YARN can coordinate
> > between
> >     > each
> >     >     > other to not use a set of nodes for the application. It an be
> > done
> >     > in 2 way
> >     >     > s-
> >     >     >
> >     >     > 1. Have a list of "exclude" nodes with Stram- when YARN
> > allcates
> >     > resources
> >     >     > on either of these, STRAM rejects and gets resources
> allocated
> > again
> >     > frm
> >     >     > YARN
> >     >     > 2. Have a list of nodes that can be used for an app - This
> can
> > be a
> >     > part of
> >     >     > config. Hwever, I don't think this would be a right way to do
> > so as
> >     > we will
> >     >     > need support from YARN as well. Further, this might be
> > difficult to
> >     > change
> >     >     > at runtim if need be.
> >     >     >
> >     >     > Any thoughts?
> >     >     >
> >     >     >
> >     >     > --
> >     >     > ~Milind bee at gee mail dot com
> >     >     >
> >     >
> >     >
> >     >
> >     >
> >
> >
> >
> >
>

Reply via email to