[jira] [Updated] (FLINK-12161) Supports partial-final optimization for stream group aggregate

godfrey he (JIRA) Thu, 11 Apr 2019 00:19:39 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-12161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


godfrey he updated FLINK-12161:
-------------------------------
    Description: 
To resolve data-skew for distinct aggregates on stream, we introduce a rule 
named {{SplitAggregateRule}} which rewrites an aggregate query with distinct 
aggregations into an expanded double aggregations. The first aggregation 
compute the results in sub-partition(with bucket) and the results are combined 
by the second aggregation.

if two-stage aggregation is also enabled, we find that many plans have common 
pattern, looks like:

{code}
... (output)
StreamExecGlobalGroupAggregate (final global agg)
+- StreamExecExchange
     +- StreamExecLocalGroupAggregate (final local agg)
          +- StreamExecGlobalGroupAggregate (partial global agg)
               +- .... (input)
{code}

There is no exchange between the final local aggregate and the partial global 
aggregate, so they will be executed in a same JobVertex, and could share state. 
We introduce a rule named {{IncrementalAggregateRule}} to do that optimization.

  was:
To resolve data-skew for distinct aggregates on stream, we introduce a rule 
named {{SplitAggregateRule}} which rewrites an aggregate query with distinct 
aggregations into an expanded double aggregations. The first aggregation 
compute the results in sub-partition(with bucket) and the results are combined 
by the second aggregation.

if two-stage aggregation is also enabled, we find that many plans have common 
pattern, looks like:

{code}
...
StreamExecGlobalGroupAggregate (final global agg)
+- StreamExecExchange
     +- StreamExecLocalGroupAggregate (final local agg)
          +- StreamExecGlobalGroupAggregate (partial global agg)
               +- ....
{code}

There is no exchange between the final local aggregate and the partial global 
aggregate, so they will be executed in a same JobVertex, and could share state. 
We introduce a rule named {{IncrementalAggregateRule}} to do that optimization.


>  Supports partial-final optimization for stream group aggregate
> ---------------------------------------------------------------
>
>                 Key: FLINK-12161
>                 URL: https://issues.apache.org/jira/browse/FLINK-12161
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / Planner
>            Reporter: godfrey he
>            Assignee: godfrey he
>            Priority: Major
>
> To resolve data-skew for distinct aggregates on stream, we introduce a rule 
> named {{SplitAggregateRule}} which rewrites an aggregate query with distinct 
> aggregations into an expanded double aggregations. The first aggregation 
> compute the results in sub-partition(with bucket) and the results are 
> combined by the second aggregation.
> if two-stage aggregation is also enabled, we find that many plans have common 
> pattern, looks like:
> {code}
> ... (output)
> StreamExecGlobalGroupAggregate (final global agg)
> +- StreamExecExchange
>      +- StreamExecLocalGroupAggregate (final local agg)
>           +- StreamExecGlobalGroupAggregate (partial global agg)
>                +- .... (input)
> {code}
> There is no exchange between the final local aggregate and the partial global 
> aggregate, so they will be executed in a same JobVertex, and could share 
> state. We introduce a rule named {{IncrementalAggregateRule}} to do that 
> optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-12161) Supports partial-final optimization for stream group aggregate

Reply via email to