[jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey

Jingsong Lee (JIRA) Tue, 20 Jun 2017 02:10:38 -0700

    [ 
https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055416#comment-16055416
 ]


Jingsong Lee commented on BEAM-2477:
------------------------------------

*Local combine*: Cloud Dataflow/Flink Batch optimizes Combine operations (such 
as Count and Sum) by performing partial combining locally before sending the 
data to the main grouping operation. Graph optimizations in 
https://cloud.google.com/blog/big-data/2017/05/after-lambda-exactly-once-processing-in-cloud-dataflow-part-2-ensuring-low-latency
*Incremental aggregation*: Similar to Flink's concept, 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#windowfunction-with-incremental-aggregation

While the GroupByKey will keep the details of elements until the window closes. 
(AFAIK in Flink Runner)

> BeamAggregationRel should use Combine.perKey instead of GroupByKey
> ------------------------------------------------------------------
>
>                 Key: BEAM-2477
>                 URL: https://issues.apache.org/jira/browse/BEAM-2477
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-sql
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>              Labels: dsl_sql_merge
>
> Their semantics are the same, but the efficiency of implementation is quite 
> different, and at the runner level there is a lot of optimization for 
> `Combine.perKey`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey

Reply via email to