Would be glad to work on the documentation part!

Thanks,
Isha

-----Original Message-----
From: "Thomas Weise" <[email protected]>
Sent: ‎4/‎8/‎2016 9:06 PM
To: "[email protected]" <[email protected]>
Subject: Re: Support for Anti-Affinity in Apex

Generic affinity support in Apex is now official! Thanks Isha for patiently
working through a lot of feedback, experimenting with all the different
scenarios and even implementing the blacklist based workaround for the CDH
scheduler issue.

It would be nice to add to the documentation also :-)

I checked and the appropriate place seems to be somewhere around:

http://apex.incubator.apache.org/docs/apex/application_development/#streams

Isha, it would be great if you can take this up and also include the JSON
configuration example.

Thanks!





On Sun, Feb 28, 2016 at 10:57 PM, Isha Arkatkar <[email protected]>
wrote:

> Hi Chinmay,
>
>   Please find my replies inline..
>
> Thanks,
> Isha
> On Sun, Feb 28, 2016 at 10:04 PM, Chinmay Kolhatkar <[email protected]>
> wrote:
>
> > Hi Isha,
> >
> > Couple of points:
> > 1. Do the first curly brackets required? They seem to add another level
> of
> > hierarchy.
> >
>     [ISHA] Yes, for serialization to and from Json, wrapped the list of
> rules in a wrapper class.
>
> 2. operatorRegex, operators, operatorList all seem to mention list list of
> > operators. Do you think we can come up with a simpler rather than
> providing
> > more options?
> >
>    [ISHA] Regex and list are just easier ways to specify operators
> affinities instead of pairs. Regex can match any Java regex.  I am not too
> good with regex, so you saw simple example there :)
>
> 3. Having JSON innside xml seems a little convoluted to me. I wonder if
> > there is a better way to set  dt.attr.AFFINITY_RULES_SET. Maybe have t as
> > XML itself?
> >
>   [ISHA] I tried XML string codec first too. But we use Hadoop's
> Configuration object for properties.xml. And it expects property value to
> be a string. So we cannot specify value as XML. Have opened a Jira for
> this, but it is actually in Hadoop's code, so we can't have XML string
> values unless we override Configuration.
>
> 4. For operatorRegex which regex language do we support? Is it Java regex
> > OR perl/bash based one?
> > [ISHA] It is java regex. Simply does a pattern match on all operators in
> > dag. It expects at least 2 matches to be found. Otherwise logs warning
> and
> > does not consider the affinity rule valid.
>
>
>
> >
> > Thanks,
> > Chinmay.
> >
> >
> >
> > On Sat, Feb 27, 2016 at 3:38 AM, Isha Arkatkar <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > >   To support setting affinity rules from properties.xml, I added json
> > > string codec to read the value. I had to wrap the list of rules in an
> > > object call AffinityRuleSet to support easy serialization.
> > >
> > > Here is an example for setting through property:
> > >
> > > <property>
> > >     <name>dt.attr.AFFINITY_RULES_SET</name>
> > >     <value>
> > >   {
> > >   "affinityRules": [
> > >     {
> > >       "operatorRegex": "rand|console",
> > >       "locality": "NODE_LOCAL",
> > >       "type": "AFFINITY",
> > >       "relaxLocality": false
> > >     },
> > >     {
> > >       "operators": {
> > >         "first": "rand",
> > >         "second": "passThru"
> > >       },
> > >       "locality": "NODE_LOCAL",
> > >       "type": "ANTI_AFFINITY",
> > >       "relaxLocality": false
> > >     },
> > >     {
> > >       "operatorsList": [
> > >         "passThru",
> > >         "passThru"
> > >       ],
> > >       "locality": "NODE_LOCAL",
> > >       "type": "ANTI_AFFINITY",
> > >       "relaxLocality": false
> > >     }
> > >   ]
> > > }
> > >     </value>
> > >   </property>
> > >
> > > Thanks,
> > > Isha
> > >
> > > On Thu, Feb 25, 2016 at 8:16 PM, Chinmay Kolhatkar <[email protected]
> >
> > > wrote:
> > >
> > > > Hi Isha,
> > > >
> > > > 2 ways looks fine.
> > > >
> > > > Just one point, how would user set affinity rules using DAG content
> > > > attribute from properties.xml file?
> > > > Can you please provide some example for this?
> > > >
> > > > Thanks,
> > > > Chinmay.
> > > >
> > > >
> > > > On Fri, Feb 26, 2016 at 2:57 AM, Isha Arkatkar <[email protected]
> >
> > > > wrote:
> > > >
> > > > > Thanks for the input Chinmay.
> > > > >
> > > > > The last parameter true/false is for relaxing the constraint. If
> > true,
> > > > the
> > > > > affinity rule would be relaxed if constraint cannot be satisfied.
> And
> > > if
> > > > > false, application will keep requesting and waiting to launch
> > > containers
> > > > > till all rules are satisfied.
> > > > >
> > > > > Regarding the dag.setAffinity APIs. I can add those couple APIs in
> > > > > LogicalPlan as a wrapper that internally can set the Dag
> Attributes.
> > > > > Probably yes, that would be easier to use, to set affinity directly
> > on
> > > > dag
> > > > > object.
> > > > >
> > > > > Then there would 2 ways to set affinity rules:
> > > > >
> > > > > 1. Through DAG context attribute (Either through code or through
> > > > > properties.xml)
> > > > > 2.  By calling setAffinity on dag.
> > > > >
> > > > > Thanks,
> > > > > Isha
> > > > >
> > > > > On Wed, Feb 24, 2016 at 10:11 PM, Chinmay Kolhatkar <
> > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > I agree with Pramod.  We should probably have single interface
> for
> > > > > > specifying both if at all possible. This is in interest of not
> > > > confusing
> > > > > > users with two different configurations at the same time make a
> > > > > > locality/affinity API to apear similar to what is out there in
> the
> > > word
> > > > > at
> > > > > > the moment.
> > > > > >
> > > > > > Though it might be early to deprecate stream locality, but after
> > some
> > > > > time
> > > > > > based on user's experience we should consider deprecating Stream
> > > > Locality
> > > > > > API.
> > > > > >
> > > > > > @Isha, nice APIs. Questions/suggestions regarding this:
> > > > > > 1. What does the "false" signify for AffinityRule constructor.
> > > > > > 2. This is a suggestion. Instead of having a List<AffinityRule>
> and
> > > > then
> > > > > > setting this object to attribute of the DAG, should we have an
> API
> > in
> > > > > > LogicalPlan like following:
> > > > > >
> > > > > > dag.setAffinity(<operator1>, <operator2>, <locality_type>,
> > > false/true)
> > > > > >
> > > > > > dag.setAntiAffinity(<operator1>, <operator2>, false/true)   //
> Not
> > > sure
> > > > > > what false/true mean, added for completion. But the point here is
> > to
> > > > > remove
> > > > > > locality for anti-affinity.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Chinmay.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Feb 25, 2016 at 5:17 AM, Pramod Immaneni <
> > > > [email protected]
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > You could specify it by specifying the names of all operators
> > that
> > > > are
> > > > > > part
> > > > > > > of the stream.
> > > > > > >
> > > > > > > On Wed, Feb 24, 2016 at 3:44 PM, Amol Kekre <
> > [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > > These are two diff ways of specifying something similar, but
> > not
> > > > > always
> > > > > > > > same. For example, how do I specify the following
> > > > > > > >
> > > > > > > > This stream has way to high throughput. Force all operators
> > that
> > > > > > connect
> > > > > > > to
> > > > > > > > it today (or in future) to be container local.
> > > > > > > >
> > > > > > > > With stream locality I get that as operator ports are added
> to
> > > the
> > > > > > > stream,
> > > > > > > > they automatically inherit stream locality.
> > > > > > > >
> > > > > > > > Thks,
> > > > > > > > Amol
> > > > > > > >
> > > > > > > > On Wed, Feb 24, 2016 at 3:37 PM, Pramod Immaneni <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > To explain the affinity support below seems to be a
> super-set
> > > > that
> > > > > > can
> > > > > > > > > consume the stream locality setting whether that was the
> > > original
> > > > > > > > intention
> > > > > > > > > or not.
> > > > > > > > >
> > > > > > > > > On Wed, Feb 24, 2016 at 3:35 PM, Pramod Immaneni <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Why would we want two different ways of specifying the
> same
> > > > > thing?
> > > > > > > > > >
> > > > > > > > > > On Wed, Feb 24, 2016 at 3:29 PM, Amol Kekre <
> > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Stream locality is a stream attribute. Affinity is
> > operator
> > > > > > > attribute.
> > > > > > > > > >> Technically they may not be connected. Though operator
> > > > affinity
> > > > > > can
> > > > > > > be
> > > > > > > > > >> extended to specify a situation that implicity covers
> > stream
> > > > > > > locality,
> > > > > > > > > it
> > > > > > > > > >> may be better to let user say the two following
> statements
> > > > > > > > > independently.
> > > > > > > > > >>
> > > > > > > > > >> - This stream is heavy on I/O and I want it to be
> > ..._local
> > > //
> > > > > > This
> > > > > > > > > >> statement is independent of future additions of
> operators
> > to
> > > > > this
> > > > > > > > stream
> > > > > > > > > >> - I need these logical operators to not be (or be) on
> the
> > > same
> > > > > > node
> > > > > > > > > >>
> > > > > > > > > >> Do note that container_local or node_local stream
> > localities
> > > > > only
> > > > > > > > relate
> > > > > > > > > >> to
> > > > > > > > > >> connected operators.
> > > > > > > > > >>
> > > > > > > > > >> Thks
> > > > > > > > > >> Amol
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> On Wed, Feb 24, 2016 at 1:54 PM, Pramod Immaneni <
> > > > > > > > > [email protected]>
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Would it make sense to deprecate stream locality in
> > favor
> > > of
> > > > > > this?
> > > > > > > > > >> >
> > > > > > > > > >> > On Tue, Feb 23, 2016 at 6:05 PM, Isha Arkatkar <
> > > > > > > > [email protected]>
> > > > > > > > > >> > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > Hi Pramod,
> > > > > > > > > >> > >
> > > > > > > > > >> > >    We can have a list or operators or regex to
> specify
> > > > > > operators
> > > > > > > > > list
> > > > > > > > > >> for
> > > > > > > > > >> > > affinity rule. Though, internal implementation will
> > > > > translate
> > > > > > > both
> > > > > > > > > >> these
> > > > > > > > > >> > > into list of pairs. Since, having pair of operators
> > > helps
> > > > in
> > > > > > > > > >> validation
> > > > > > > > > >> > > phase.
> > > > > > > > > >> > >
> > > > > > > > > >> > >  And yes, validation phase will catch conflicting
> > > locality
> > > > > or
> > > > > > > > > affinity
> > > > > > > > > >> > > rules and throw validation exception.
> > > > > > > > > >> > >
> > > > > > > > > >> > >  Thanks for input regarding unifiers. It is probably
> > > > alright
> > > > > > to
> > > > > > > > not
> > > > > > > > > >> > include
> > > > > > > > > >> > > it in the affinity settings. Will leave the current
> > > > > > > implementation
> > > > > > > > > as
> > > > > > > > > >> is.
> > > > > > > > > >> > >
> > > > > > > > > >> > > Thanks!
> > > > > > > > > >> > > Isha
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Tue, Feb 23, 2016 at 5:56 PM, Pramod Immaneni <
> > > > > > > > > >> [email protected]
> > > > > > > > > >> > >
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Isha,
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > If the implementation would support (eventually)
> > > > affinity
> > > > > > > > between
> > > > > > > > > >> more
> > > > > > > > > >> > > than
> > > > > > > > > >> > > > 2 logical operators, instead of relying on
> > specifying
> > > it
> > > > > as
> > > > > > > > > multiple
> > > > > > > > > >> > > rules
> > > > > > > > > >> > > > why not make provision in the API to specify more
> > > than 2
> > > > > > > logical
> > > > > > > > > >> > > operators?
> > > > > > > > > >> > > > If there is a stream locality rule that conflicts
> > with
> > > > the
> > > > > > > > > affinity
> > > > > > > > > >> > rule
> > > > > > > > > >> > > > will the validation catch this? With unifiers the
> > > common
> > > > > > case
> > > > > > > of
> > > > > > > > > MxN
> > > > > > > > > >> > > > locality is already chosen based on downstream
> > > operator.
> > > > > In
> > > > > > > > other
> > > > > > > > > >> cases
> > > > > > > > > >> > > > such as single downstream or cascade case I am not
> > > sure
> > > > it
> > > > > > is
> > > > > > > > all
> > > > > > > > > >> the
> > > > > > > > > >> > > > important to have it.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Thanks
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > On Tue, Feb 23, 2016 at 5:37 PM, Isha Arkatkar <
> > > > > > > > > >> [email protected]>
> > > > > > > > > >> > > > wrote:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > Hi all,
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >     I have opened a review only pull request for
> > > > > handling
> > > > > > > > > affinity
> > > > > > > > > >> > > rules
> > > > > > > > > >> > > > in
> > > > > > > > > >> > > > > Apex:
> > > > > > > > > >> > > > >
> > > > > https://github.com/apache/incubator-apex-core/pull/234
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >  Just wanted to confirm with everyone, if the
> APIs
> > > for
> > > > > > > > > specifying
> > > > > > > > > >> > > > affinity
> > > > > > > > > >> > > > > rules look alright. Also, wanted to get views on
> > how
> > > > > > > unifiers
> > > > > > > > > >> should
> > > > > > > > > >> > be
> > > > > > > > > >> > > > > handled in case of operator affinity rules. More
> > > > details
> > > > > > > > follow:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >  I have added a list of affinity rules as an
> > > attribute
> > > > > in
> > > > > > > Dag
> > > > > > > > > >> > context.
> > > > > > > > > >> > > > For
> > > > > > > > > >> > > > > now, have added AffinityRule to be specified for
> > > > > operator
> > > > > > > > pairs.
> > > > > > > > > >> We
> > > > > > > > > >> > can
> > > > > > > > > >> > > > add
> > > > > > > > > >> > > > > regex support in next iteration.
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Here is sample usage:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >    List<AffinityRule> rules = new ArrayList<>();
> > > > > > > > > >> > > > >    // To add node locality between  2 not
> > connected
> > > > > > > operators
> > > > > > > > > rand
> > > > > > > > > >> > and
> > > > > > > > > >> > > > > console:
> > > > > > > > > >> > > > >    rules.add(new AffinityRule(Type.AFFINITY, new
> > > > > > > OperatorPair(
> > > > > > > > > >> > "rand",
> > > > > > > > > >> > > > > "console"), Locality.NODE_LOCAL, false));
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >    // To add anti-affinity between all
> partitions
> > of
> > > > > rand
> > > > > > > > > operator
> > > > > > > > > >> > with
> > > > > > > > > >> > > > all
> > > > > > > > > >> > > > > partitions of passThru operator:
> > > > > > > > > >> > > > >    rules.add(new
> AffinityRule(Type.ANTI_AFFINITY,
> > > new
> > > > > > > > > >> OperatorPair(
> > > > > > > > > >> > > > "rand",
> > > > > > > > > >> > > > > "passThru"), Locality.NODE_LOCAL, false));
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >    // To add anti-affinity between partitions of
> > > > > passThru
> > > > > > > > > >> operator,
> > > > > > > > > >> > > give
> > > > > > > > > >> > > > > same operator name in pair:
> > > > > > > > > >> > > > >     rules.add(new
> AffinityRule(Type.ANTI_AFFINITY,
> > > new
> > > > > > > > > >> OperatorPair(
> > > > > > > > > >> > > > > "passThru", "passThru"), Locality.NODE_LOCAL,
> > > false));
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >   // Set the rules in dag context
> > > > > > > > > >> > > > >   dag.setAttribute(DAGContext.AFFINITY_RULES,
> > > rules);
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Please find a sample application using these
> > > affinity
> > > > > > rules
> > > > > > > > > here:
> > > > > > > > > >> > > > >
> > > > > > > > > >>
> > > > > >
> > https://github.com/ishark/Apex-Samples/tree/master/affinity-example
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > The actual implementation of affinity rules
> > heavily
> > > > > > depends
> > > > > > > > node
> > > > > > > > > >> > > specific
> > > > > > > > > >> > > > > requests, that was already being handled in
> > > > > > > > > >> > StreamingAppMasterService.
> > > > > > > > > >> > > To
> > > > > > > > > >> > > > > handle node requests for cloudera, I have added
> an
> > > > > > override
> > > > > > > to
> > > > > > > > > >> > > blacklist
> > > > > > > > > >> > > > > all other nodes except the ones where container
> > > > request
> > > > > is
> > > > > > > to
> > > > > > > > be
> > > > > > > > > >> > > issued.
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > There is one open question I wanted to bring up:
> > In
> > > > case
> > > > > > of
> > > > > > > > > >> > partitioned
> > > > > > > > > >> > > > > operators, should the unifiers also follow
> > affinity
> > > or
> > > > > > > > > >> anti-affinity
> > > > > > > > > >> > > > rules?
> > > > > > > > > >> > > > > Or should they be independent. For now, I have
> > kept
> > > > them
> > > > > > > > > >> independent
> > > > > > > > > >> > > and
> > > > > > > > > >> > > > > only the actual operators follow affinity rules.
> > > > Please
> > > > > > > > suggest
> > > > > > > > > >> what
> > > > > > > > > >> > > > would
> > > > > > > > > >> > > > > make more sense from user's specification
> > > perspective.
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Thanks,
> > > > > > > > > >> > > > > Isha
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > On Mon, Feb 1, 2016 at 2:12 PM, Isha Arkatkar <
> > > > > > > > > >> [email protected]>
> > > > > > > > > >> > > > > wrote:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > > Hi folks,
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >    Summarizing the proposal for
> > > > affinity/anti-affinity
> > > > > > > rules
> > > > > > > > > in
> > > > > > > > > >> > apex
> > > > > > > > > >> > > as
> > > > > > > > > >> > > > > > per discussions in this mail thread. Please
> > > suggest
> > > > > if I
> > > > > > > > > missed
> > > > > > > > > >> > > > > something.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > *  For configuration:*
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >   - We will have application level
> > > > > > affinity/anti-affinity
> > > > > > > > > rules.
> > > > > > > > > >> > For
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > > > first iteration, we can support for
> > > (anti-)affinity
> > > > > > among
> > > > > > > > > >> operators
> > > > > > > > > >> > > > > within
> > > > > > > > > >> > > > > > single application. We can revisit it in the
> > next
> > > > > > > iteration
> > > > > > > > to
> > > > > > > > > >> > > support
> > > > > > > > > >> > > > > > across applications.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >  -  Each rule will consist of 4 components:
> > > > > > > > > >> > > > > >    <List of operators>,
> >  AFFINITY/ANTI-AFFINITY,
> > > > > > > > > >>  STRICT/RELAXED
> > > > > > > > > >> > > > > policy,
> > > > > > > > > >> > > > > > CONTAINER/NODE/RACK
> > > > > > > > > >> > > > > > I have checked that Apex supports container
> > > locality
> > > > > > > > between 2
> > > > > > > > > >> > > > operators
> > > > > > > > > >> > > > > > that are not connected by Steam. Though did
> not
> > > find

[The entire original message is not included.]

Reply via email to