[ 
https://issues.apache.org/jira/browse/BEAM-10409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244126#comment-17244126
 ] 

Beam JIRA Bot commented on BEAM-10409:
--------------------------------------

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> Add combiner packing to graph optimizer phases
> ----------------------------------------------
>
>                 Key: BEAM-10409
>                 URL: https://issues.apache.org/jira/browse/BEAM-10409
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-core
>            Reporter: Yifan Mai
>            Assignee: Yifan Mai
>            Priority: P2
>              Labels: stale-assigned
>          Time Spent: 8h
>  Remaining Estimate: 0h
>
> Some use cases of Beam (e.g. [TensorFlow 
> Transform|https://github.com/tensorflow/transform]) create thousands of 
> Combine stages with a common parent. The large number of stages can cause 
> performance issues on some runners. To alleviate, a graph optimization phase 
> could be added to the translations module that packs compatible Combine 
> stages into a single stage.
> The graph optimization for CombinePerKey would work as follows: If 
> CombinePerKey stages have a common input, one input each, and one output 
> each, pack the stages into a single stage that runs all CombinePerKeys and 
> outputs resulting tuples to a new PCollection. A subsequent stage unpacks 
> tuples from this PCollection and sends them to the original output 
> PCollections.
> There is an additional issue with supporting this for CombineGlobally: 
> because of the intermediate KeyWithVoid stage between the CombinePerKey 
> stages and the input stage, the CombinePerKey stages do not have a common 
> input stage, and cannot be packed. To support CombineGlobally, a common 
> sibling elimination graph optimization phase can be used to combine the 
> KeyWithVoid stages. After this, the CombinePerKey stages would have a common 
> input and can be packed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to