[
https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055416#comment-16055416
]
Jingsong Lee commented on BEAM-2477:
------------------------------------
*Local combine*: Cloud Dataflow/Flink Batch optimizes Combine operations (such
as Count and Sum) by performing partial combining locally before sending the
data to the main grouping operation. Graph optimizations in
https://cloud.google.com/blog/big-data/2017/05/after-lambda-exactly-once-processing-in-cloud-dataflow-part-2-ensuring-low-latency
*Incremental aggregation*: Similar to Flink's concept,
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#windowfunction-with-incremental-aggregation
While the GroupByKey will keep the details of elements until the window closes.
(AFAIK in Flink Runner)
> BeamAggregationRel should use Combine.perKey instead of GroupByKey
> ------------------------------------------------------------------
>
> Key: BEAM-2477
> URL: https://issues.apache.org/jira/browse/BEAM-2477
> Project: Beam
> Issue Type: Improvement
> Components: dsl-sql
> Reporter: Jingsong Lee
> Assignee: Jingsong Lee
> Labels: dsl_sql_merge
>
> Their semantics are the same, but the efficiency of implementation is quite
> different, and at the runner level there is a lot of optimization for
> `Combine.perKey`.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)