godfrey he created FLINK-14874:
----------------------------------

             Summary: add local aggregate to solve data skew for ROLLUP/CUBE 
case
                 Key: FLINK-14874
                 URL: https://issues.apache.org/jira/browse/FLINK-14874
             Project: Flink
          Issue Type: Sub-task
          Components: Table SQL / Planner
    Affects Versions: 1.9.1, 1.9.0
            Reporter: godfrey he
             Fix For: 1.10.0


Many tpc-ds queries have {{rollup}} keyword, which will be translated to 
multiple groups. 
for example:  {{group by rollup (channel, id) }} is equivalent {{ group by 
(channel, id)}} +  {{ group by (channel)}} +  {{ group by () }}. 
All data on empty group will be shuffled to a single node, It is a typical data 
skew case. If there is a local aggregate, the data size shuffled to the single 
node will be greatly reduced. However, currently the cost mode can't estimate 
the local aggregate's cost, and the plan with local aggregate may be chose even 
the query has {{rollup}} keyword.
we could add a rule based phase (after physical phase) to enforce local 
aggregate if it's input has empty group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to