I agree, this should be on top of affinity work

Thks
Amol

On Thu, Dec 1, 2016 at 1:01 PM, Pramod Immaneni <pra...@datatorrent.com>
wrote:

> I see a host locality available as an attribute in DAG for individual
> operators. If affinity doesn't support this today, we could probably add
> it. You could also make setting a blacklist directly a convenience function
> on top of affinity.
>
> On Thu, Dec 1, 2016 at 11:58 AM, Sandesh Hegde <sand...@datatorrent.com>
> wrote:
>
> > Pramod,
> >
> > How to specify,  "don't deploy any operators on Node20" using
> > anti-affinity?
> >
> > I don't see any examples here,
> > http://apex.apache.org/docs/apex/application_development/#affinity-rules
> >
> >
> > On Thu, Dec 1, 2016 at 11:31 AM Pramod Immaneni <pra...@datatorrent.com>
> > wrote:
> >
> > > Shouldn't this be already covered by anti-affinity. Today users can
> > specify
> > > multiple affinity rules, for each rule they can specify positive or
> > > negative affinity, locality and operator selection. If an affinity rule
> > > specifying negative affinity, node locality and all operators, does not
> > > work then let's fix that scenario instead of creating a new option.
> > >
> > > On Thu, Dec 1, 2016 at 11:17 AM, Sandesh Hegde <
> sand...@datatorrent.com>
> > > wrote:
> > >
> > > > I have created a jira, for adding the list of blacklisted nodes,
> > > > https://issues.apache.org/jira/browse/APEXCORE-584
> > > >
> > > > On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare <
> san...@datatorrent.com
> > >
> > > > wrote:
> > > >
> > > > > Yes, Ram explained to me that in practice this would be a useful
> > > feature
> > > > > for Apex devops who typically have no control over Hadoop/Yarn
> > cluster.
> > > > >
> > > > > On 11/30/16, 9:22 PM, "Mohit Jotwani" <mo...@datatorrent.com>
> wrote:
> > > > >
> > > > >     This is a practical scenario where developers would be required
> > to
> > > > > exclude
> > > > >     certain nodes as they might be required for some mission
> critical
> > > > >     applications. It would be good to have this feature.
> > > > >
> > > > >     I understand that Stram should not get into resourcing and
> still
> > > rely
> > > > > on
> > > > >     Yarn, however, as the App Master it should have the right to
> > reject
> > > > the
> > > > >     nodes offered by Yarn and request for other resources.
> > > > >
> > > > >     Regards,
> > > > >     Mohit
> > > > >
> > > > >     On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde <
> > > > sand...@datatorrent.com
> > > > > >
> > > > >     wrote:
> > > > >
> > > > >     > Apex has automatic blacklisting of the troublesome nodes,
> > please
> > > > > take a
> > > > >     > look at the following attributes,
> > > > >     >
> > > > >     > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
> > > > >     > https://www.datatorrent.com/docs/apidocs/com/datatorrent/
> > > > >     > api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_
> > > > >     > FAILURES_FOR_BLACKLIST
> > > > >     >
> > > > >     > BLACKLISTED_NODE_REMOVAL_TIME_MILLIS
> > > > >     >
> > > > >     > Thanks
> > > > >     >
> > > > >     >
> > > > >     >
> > > > >     > On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath <
> > > > > r...@datatorrent.com>
> > > > >     > wrote:
> > > > >     >
> > > > >     > Not sure if this is what Milind had in mind but we often run
> > into
> > > > >     > situations where the dev group
> > > > >     > working with Apex has no control over cluster configuration
> --
> > to
> > > > > make any
> > > > >     > changes to the cluster they need to
> > > > >     > go through an elaborate process that can take many days.
> > > > >     >
> > > > >     > Meanwhile, if they notice that a particular node is
> > consistently
> > > > > causing
> > > > >     > problems for their
> > > > >     > app, having a simple way to exclude it would be very helpful
> > > since
> > > > > it gives
> > > > >     > them a way
> > > > >     > to bypass communication and process issues within their own
> > > > > organization.
> > > > >     >
> > > > >     > Ram
> > > > >     >
> > > > >     > On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <
> > > > > san...@datatorrent.com>
> > > > >     > wrote:
> > > > >     >
> > > > >     > > To me both use cases appear to be generic resource
> management
> > > use
> > > > > cases.
> > > > >     > > For example, a randomly rebooting node is not good for any
> > > > purpose
> > > > > esp.
> > > > >     > > long running apps so it is a bit of a stretch to imagine
> that
> > > > > these nodes
> > > > >     > > will be acceptable for some batch jobs in Yarn. So such a
> > node
> > > > > should be
> > > > >     > > marked “Bad” or Unavailable in Yarn itself.
> > > > >     > >
> > > > >     > > Second use case is also typical anti-affinity use case
> which
> > > > > ideally
> > > > >     > > should be implemented in Yarn – Milind’s example can also
> > apply
> > > > to
> > > > >     > non-Apex
> > > > >     > > batch jobs. In any case it looks like Yarn still doesn’t
> have
> > > it
> > > > (
> > > > >     > > https://issues.apache.org/jira/browse/YARN-1042) so if
> Apex
> > > > needs
> > > > > it we
> > > > >     > > will need to do it ourselves.
> > > > >     > >
> > > > >     > > On 11/30/16, 10:39 AM, "Munagala Ramanath" <
> > > r...@datatorrent.com>
> > > > > wrote:
> > > > >     > >
> > > > >     > >     But then, what's the solution to the 2 problem
> scenarios
> > > that
> > > > > Milind
> > > > >     > >     describes ?
> > > > >     > >
> > > > >     > >     Ram
> > > > >     > >
> > > > >     > >     On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> > > > >     > > san...@datatorrent.com>
> > > > >     > >     wrote:
> > > > >     > >
> > > > >     > >     > I think “exclude nodes” and such is really the job of
> > the
> > > > > resource
> > > > >     > > manager
> > > > >     > >     > i.e. Yarn. So I am not sure taking over some of these
> > > tasks
> > > > > in Apex
> > > > >     > > would
> > > > >     > >     > be very useful.
> > > > >     > >     >
> > > > >     > >     > I agree with Amol that apps should be node neutral.
> > > > Resource
> > > > >     > > management in
> > > > >     > >     > Yarn together with fault tolerance in Apex should
> > > minimize
> > > > > the need
> > > > >     > > for
> > > > >     > >     > this feature although I am sure one can find use
> cases.
> > > > >     > >     >
> > > > >     > >     >
> > > > >     > >     > On 11/29/16, 10:41 PM, "Amol Kekre" <
> > > a...@datatorrent.com>
> > > > > wrote:
> > > > >     > >     >
> > > > >     > >     >     We do have this feature in Yarn, but that applies
> > to
> > > > all
> > > > >     > > applications.
> > > > >     > >     > I am
> > > > >     > >     >     not sure if Yarn has anti-affinity. This feature
> > may
> > > be
> > > > > used,
> > > > >     > > but in
> > > > >     > >     >     general there is danger is an application taking
> > over
> > > > > resource
> > > > >     > >     > allocation.
> > > > >     > >     >     Another quirk is that big data apps should
> ideally
> > be
> > > > >     > > node-neutral.
> > > > >     > >     > This is
> > > > >     > >     >     a good idea, if we are able to carve out
> something
> > > > where
> > > > > need
> > > > >     > is
> > > > >     > > app
> > > > >     > >     >     specific.
> > > > >     > >     >
> > > > >     > >     >     Thks
> > > > >     > >     >     Amol
> > > > >     > >     >
> > > > >     > >     >
> > > > >     > >     >     On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> > > > >     > > mili...@gmail.com>
> > > > >     > >     > wrote:
> > > > >     > >     >
> > > > >     > >     >     > We have seen 2 cases mentioned below, where, it
> > > would
> > > > > have
> > > > >     > > been nice
> > > > >     > >     > if
> > > > >     > >     >     > Apex allowed us to exclude a node from the
> > cluster
> > > > for
> > > > > an
> > > > >     > >     > application.
> > > > >     > >     >     >
> > > > >     > >     >     > 1. A node in the cluster had gone bad (was
> > randomly
> > > > >     > rebooting)
> > > > >     > > and
> > > > >     > >     > so an
> > > > >     > >     >     > Apex app should not use it - other apps can use
> > it
> > > as
> > > > > they
> > > > >     > were
> > > > >     > >     > batch jobs.
> > > > >     > >     >     > 2. A node is being used for a mission critical
> > app
> > > > > (Could be
> > > > >     > > an Apex
> > > > >     > >     > app
> > > > >     > >     >     > itself), but another Apex app which is mission
> > > > critical
> > > > >     > should
> > > > >     > > not
> > > > >     > >     > be using
> > > > >     > >     >     > resources on that node.
> > > > >     > >     >     >
> > > > >     > >     >     > Can we have a way in which, Stram and YARN can
> > > > > coordinate
> > > > >     > > between
> > > > >     > >     > each
> > > > >     > >     >     > other to not use a set of nodes for the
> > > application.
> > > > > It an be
> > > > >     > > done
> > > > >     > >     > in 2 way
> > > > >     > >     >     > s-
> > > > >     > >     >     >
> > > > >     > >     >     > 1. Have a list of "exclude" nodes with Stram-
> > when
> > > > YARN
> > > > >     > > allcates
> > > > >     > >     > resources
> > > > >     > >     >     > on either of these, STRAM rejects and gets
> > > resources
> > > > >     > allocated
> > > > >     > > again
> > > > >     > >     > frm
> > > > >     > >     >     > YARN
> > > > >     > >     >     > 2. Have a list of nodes that can be used for an
> > > app -
> > > > > This
> > > > >     > can
> > > > >     > > be a
> > > > >     > >     > part of
> > > > >     > >     >     > config. Hwever, I don't think this would be a
> > right
> > > > > way to do
> > > > >     > > so as
> > > > >     > >     > we will
> > > > >     > >     >     > need support from YARN as well. Further, this
> > might
> > > > be
> > > > >     > > difficult to
> > > > >     > >     > change
> > > > >     > >     >     > at runtim if need be.
> > > > >     > >     >     >
> > > > >     > >     >     > Any thoughts?
> > > > >     > >     >     >
> > > > >     > >     >     >
> > > > >     > >     >     > --
> > > > >     > >     >     > ~Milind bee at gee mail dot com
> > > > >     > >     >     >
> > > > >     > >     >
> > > > >     > >     >
> > > > >     > >     >
> > > > >     > >     >
> > > > >     > >
> > > > >     > >
> > > > >     > >
> > > > >     > >
> > > > >     >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to