I have created a jira, for adding the list of blacklisted nodes,
https://issues.apache.org/jira/browse/APEXCORE-584

On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare <san...@datatorrent.com>
wrote:

> Yes, Ram explained to me that in practice this would be a useful feature
> for Apex devops who typically have no control over Hadoop/Yarn cluster.
>
> On 11/30/16, 9:22 PM, "Mohit Jotwani" <mo...@datatorrent.com> wrote:
>
>     This is a practical scenario where developers would be required to
> exclude
>     certain nodes as they might be required for some mission critical
>     applications. It would be good to have this feature.
>
>     I understand that Stram should not get into resourcing and still rely
> on
>     Yarn, however, as the App Master it should have the right to reject the
>     nodes offered by Yarn and request for other resources.
>
>     Regards,
>     Mohit
>
>     On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde <sand...@datatorrent.com
> >
>     wrote:
>
>     > Apex has automatic blacklisting of the troublesome nodes, please
> take a
>     > look at the following attributes,
>     >
>     > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
>     > https://www.datatorrent.com/docs/apidocs/com/datatorrent/
>     > api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_
>     > FAILURES_FOR_BLACKLIST
>     >
>     > BLACKLISTED_NODE_REMOVAL_TIME_MILLIS
>     >
>     > Thanks
>     >
>     >
>     >
>     > On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath <
> r...@datatorrent.com>
>     > wrote:
>     >
>     > Not sure if this is what Milind had in mind but we often run into
>     > situations where the dev group
>     > working with Apex has no control over cluster configuration -- to
> make any
>     > changes to the cluster they need to
>     > go through an elaborate process that can take many days.
>     >
>     > Meanwhile, if they notice that a particular node is consistently
> causing
>     > problems for their
>     > app, having a simple way to exclude it would be very helpful since
> it gives
>     > them a way
>     > to bypass communication and process issues within their own
> organization.
>     >
>     > Ram
>     >
>     > On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <
> san...@datatorrent.com>
>     > wrote:
>     >
>     > > To me both use cases appear to be generic resource management use
> cases.
>     > > For example, a randomly rebooting node is not good for any purpose
> esp.
>     > > long running apps so it is a bit of a stretch to imagine that
> these nodes
>     > > will be acceptable for some batch jobs in Yarn. So such a node
> should be
>     > > marked “Bad” or Unavailable in Yarn itself.
>     > >
>     > > Second use case is also typical anti-affinity use case which
> ideally
>     > > should be implemented in Yarn – Milind’s example can also apply to
>     > non-Apex
>     > > batch jobs. In any case it looks like Yarn still doesn’t have it (
>     > > https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs
> it we
>     > > will need to do it ourselves.
>     > >
>     > > On 11/30/16, 10:39 AM, "Munagala Ramanath" <r...@datatorrent.com>
> wrote:
>     > >
>     > >     But then, what's the solution to the 2 problem scenarios that
> Milind
>     > >     describes ?
>     > >
>     > >     Ram
>     > >
>     > >     On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
>     > > san...@datatorrent.com>
>     > >     wrote:
>     > >
>     > >     > I think “exclude nodes” and such is really the job of the
> resource
>     > > manager
>     > >     > i.e. Yarn. So I am not sure taking over some of these tasks
> in Apex
>     > > would
>     > >     > be very useful.
>     > >     >
>     > >     > I agree with Amol that apps should be node neutral. Resource
>     > > management in
>     > >     > Yarn together with fault tolerance in Apex should minimize
> the need
>     > > for
>     > >     > this feature although I am sure one can find use cases.
>     > >     >
>     > >     >
>     > >     > On 11/29/16, 10:41 PM, "Amol Kekre" <a...@datatorrent.com>
> wrote:
>     > >     >
>     > >     >     We do have this feature in Yarn, but that applies to all
>     > > applications.
>     > >     > I am
>     > >     >     not sure if Yarn has anti-affinity. This feature may be
> used,
>     > > but in
>     > >     >     general there is danger is an application taking over
> resource
>     > >     > allocation.
>     > >     >     Another quirk is that big data apps should ideally be
>     > > node-neutral.
>     > >     > This is
>     > >     >     a good idea, if we are able to carve out something where
> need
>     > is
>     > > app
>     > >     >     specific.
>     > >     >
>     > >     >     Thks
>     > >     >     Amol
>     > >     >
>     > >     >
>     > >     >     On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
>     > > mili...@gmail.com>
>     > >     > wrote:
>     > >     >
>     > >     >     > We have seen 2 cases mentioned below, where, it would
> have
>     > > been nice
>     > >     > if
>     > >     >     > Apex allowed us to exclude a node from the cluster for
> an
>     > >     > application.
>     > >     >     >
>     > >     >     > 1. A node in the cluster had gone bad (was randomly
>     > rebooting)
>     > > and
>     > >     > so an
>     > >     >     > Apex app should not use it - other apps can use it as
> they
>     > were
>     > >     > batch jobs.
>     > >     >     > 2. A node is being used for a mission critical app
> (Could be
>     > > an Apex
>     > >     > app
>     > >     >     > itself), but another Apex app which is mission critical
>     > should
>     > > not
>     > >     > be using
>     > >     >     > resources on that node.
>     > >     >     >
>     > >     >     > Can we have a way in which, Stram and YARN can
> coordinate
>     > > between
>     > >     > each
>     > >     >     > other to not use a set of nodes for the application.
> It an be
>     > > done
>     > >     > in 2 way
>     > >     >     > s-
>     > >     >     >
>     > >     >     > 1. Have a list of "exclude" nodes with Stram- when YARN
>     > > allcates
>     > >     > resources
>     > >     >     > on either of these, STRAM rejects and gets resources
>     > allocated
>     > > again
>     > >     > frm
>     > >     >     > YARN
>     > >     >     > 2. Have a list of nodes that can be used for an app -
> This
>     > can
>     > > be a
>     > >     > part of
>     > >     >     > config. Hwever, I don't think this would be a right
> way to do
>     > > so as
>     > >     > we will
>     > >     >     > need support from YARN as well. Further, this might be
>     > > difficult to
>     > >     > change
>     > >     >     > at runtim if need be.
>     > >     >     >
>     > >     >     > Any thoughts?
>     > >     >     >
>     > >     >     >
>     > >     >     > --
>     > >     >     > ~Milind bee at gee mail dot com
>     > >     >     >
>     > >     >
>     > >     >
>     > >     >
>     > >     >
>     > >
>     > >
>     > >
>     > >
>     >
>
>
>
>

Reply via email to