Thomas, Thanks. I'll work on optimization as suggested by you.
-Chinmay. On Mon, Feb 27, 2017 at 4:16 AM, Thomas Weise <t...@apache.org> wrote: > Chinmay, > > I would recommend to collapse ParDo sequences to the maximum extend > possible using THREAD_LOCAL affinity. The Apex runner has a configuration > file option that can be used to tune the chaining when needed ( > https://issues.apache.org/jira/browse/BEAM-980). What you build into the > pipeline translation should just be the default behavior. > > Note that in cases where operators have multiple inputs, you may not be > able to use THREAD_LOCAL and may have to use CONTAINER_LOCAL instead. > > Thanks, > Thomas > > > On Wed, Feb 22, 2017 at 10:59 PM, Chinmay Kolhatkar <chin...@apache.org> > wrote: > > > Dear Community, > > > > I'm working on BEAM-831 to implement ParDo chaining for Apache Apex > Runner. > > > > As suggested on Jira, chaining needs to be done using Stream locality of > > Apache Apex engine. > > > > I got some links from Eugene Kirpichov on the Jira. I'm currently > focusing > > on producer-consumer fusion optimization. I'm unsure how much good it is > to > > do sibling fusion for Apex Runner as of now. > > > > For producer-consumer fusion, I am able to identify which stages are > > ParDos. > > Only thing that I'm not sure about is when to stop the merging of > ParDos... > > i.e. if the DAG is like ParDo A -> ParDo B -> ParDo C -> ParDo D. > > Then at time it might be efficient to merge only B & C and not merge all > of > > them... > > > > How should this decision be made? Any reference for available for it? > > > > Please suggest. > > > > Thanks, > > Chinmay. > > >