Re: BEAM-831 ParDo chaining for Apex runner

Chinmay Kolhatkar Sun, 26 Feb 2017 22:23:28 -0800

Thomas,

Thanks. I'll work on optimization as suggested by you.


-Chinmay.


On Mon, Feb 27, 2017 at 4:16 AM, Thomas Weise <t...@apache.org> wrote:

> Chinmay,
>
> I would recommend to collapse ParDo sequences to the maximum extend
> possible using THREAD_LOCAL affinity. The Apex runner has a configuration
> file option that can be used to tune the chaining when needed (
> https://issues.apache.org/jira/browse/BEAM-980). What you build into the
> pipeline translation should just be the default behavior.
>
> Note that in cases where operators have multiple inputs, you may not be
> able to use THREAD_LOCAL and may have to use CONTAINER_LOCAL instead.
>
> Thanks,
> Thomas
>
>
> On Wed, Feb 22, 2017 at 10:59 PM, Chinmay Kolhatkar <chin...@apache.org>
> wrote:
>
> > Dear Community,
> >
> > I'm working on BEAM-831 to implement ParDo chaining for Apache Apex
> Runner.
> >
> > As suggested on Jira, chaining needs to be done using Stream locality of
> > Apache Apex engine.
> >
> > I got some links from Eugene Kirpichov on the Jira. I'm currently
> focusing
> > on producer-consumer fusion optimization. I'm unsure how much good it is
> to
> > do sibling fusion for Apex Runner as of now.
> >
> > For producer-consumer fusion, I am able to identify which stages are
> > ParDos.
> > Only thing that I'm not sure about is when to stop the merging of
> ParDos...
> > i.e. if the DAG is like ParDo A -> ParDo B -> ParDo C -> ParDo D.
> > Then at time it might be efficient to merge only B & C and not merge all
> of
> > them...
> >
> > How should this decision be made? Any reference for available for it?
> >
> > Please suggest.
> >
> > Thanks,
> > Chinmay.
> >
>

Re: BEAM-831 ParDo chaining for Apex runner

Reply via email to