BEAM-831 ParDo chaining for Apex runner

Chinmay Kolhatkar Wed, 22 Feb 2017 23:00:07 -0800

Dear Community,

I'm working on BEAM-831 to implement ParDo chaining for Apache Apex Runner.


As suggested on Jira, chaining needs to be done using Stream locality of
Apache Apex engine.

I got some links from Eugene Kirpichov on the Jira. I'm currently focusing
on producer-consumer fusion optimization. I'm unsure how much good it is to
do sibling fusion for Apex Runner as of now.

For producer-consumer fusion, I am able to identify which stages are
ParDos.
Only thing that I'm not sure about is when to stop the merging of ParDos...
i.e. if the DAG is like ParDo A -> ParDo B -> ParDo C -> ParDo D.
Then at time it might be efficient to merge only B & C and not merge all of
them...

How should this decision be made? Any reference for available for it?

Please suggest.

Thanks,
Chinmay.

BEAM-831 ParDo chaining for Apex runner

Reply via email to