[ 
https://issues.apache.org/jira/browse/BEAM-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090920#comment-17090920
 ] 

Luke Cwik commented on BEAM-7199:
---------------------------------

The SDF expansion is PairWithRestriction -> InitialSplittingWithSizing -> 
Reshuffle -> ProcessSizedElementsAndRestrictions
Implementation here:
https://github.com/apache/beam/blob/ec67a9374671ea9ae670fb0f3935ead2ebed7981/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/SplittableParDoExpander.java#L68

The optimization is the initial splitting happening during executing as data in 
the pipeline and the reshuffle enabling runners to "redistribute" the work 
across multiple workers.

The Combiner optimization should have a similar expansion being done like the 
SDF one. The expansion is documented here: 
https://docs.google.com/document/d/1-3mEs3Y7bIkJ0hmQ6SiHpVIFu5vbY6Zcpw-7tOMVg4U/edit#heading=h.eojkgyq8j323

> Better optimize Portable pipelines
> ----------------------------------
>
>                 Key: BEAM-7199
>                 URL: https://issues.apache.org/jira/browse/BEAM-7199
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Ankur Goenka
>            Priority: Major
>              Labels: portability
>
> Python has an experimental flag pre_optimize=all which does pre-optimization 
> of python pipelines by fusing operators.
> Python optimization is expected to be better than the one in java because it 
> has more information about the pipeline.
> Make java pipeline optimization at par with python so that the benefits can 
> be shared by all languages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to