[jira] [Commented] (BEAM-7912) Optimize GroupIntoBatches for batch Dataflow pipelines

Luke Cwik (JIRA) Wed, 07 Aug 2019 11:32:13 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902392#comment-16902392
 ]


Luke Cwik commented on BEAM-7912:
---------------------------------

This optimization can be done in Python as well:
https://github.com/apache/beam/blob/bc2c6ff5d4a464a4103db4f9835bac2e42258771/sdks/python/apache_beam/transforms/util.py#L690

> Optimize GroupIntoBatches for batch Dataflow pipelines
> ------------------------------------------------------
>
>                 Key: BEAM-7912
>                 URL: https://issues.apache.org/jira/browse/BEAM-7912
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>            Reporter: Luke Cwik
>            Assignee: Luke Cwik
>            Priority: Minor
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The GroupIntoBatches transform can be significantly optimized on Dataflow 
> since it always ensures that a key K appears in only one bundle after a 
> GroupByKey. This removes the usage of state and timers in the generic 
> GroupIntoBatches transform.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (BEAM-7912) Optimize GroupIntoBatches for batch Dataflow pipelines

Reply via email to