Dear Community,

I'm working on BEAM-831 to implement ParDo chaining for Apache Apex Runner.

As suggested on Jira, chaining needs to be done using Stream locality of
Apache Apex engine.

I got some links from Eugene Kirpichov on the Jira. I'm currently focusing
on producer-consumer fusion optimization. I'm unsure how much good it is to
do sibling fusion for Apex Runner as of now.

For producer-consumer fusion, I am able to identify which stages are
ParDos.
Only thing that I'm not sure about is when to stop the merging of ParDos...
i.e. if the DAG is like ParDo A -> ParDo B -> ParDo C -> ParDo D.
Then at time it might be efficient to merge only B & C and not merge all of
them...

How should this decision be made? Any reference for available for it?

Please suggest.

Thanks,
Chinmay.

Reply via email to