[
https://issues.apache.org/jira/browse/FLINK-14874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
godfrey he updated FLINK-14874:
-------------------------------
Description:
Many tpc-ds queries have {{rollup}} keyword, which will be translated to
multiple groups.
for example: {{group by rollup (channel, id)}} is equivalent {{group by
(channel, id)}} + {{group by (channel)}} + {{group by ()}}.
All data on empty group will be shuffled to a single node, It is a typical data
skew case. If there is a local aggregate, the data size shuffled to the single
node will be greatly reduced. However, currently the cost mode can't estimate
the local aggregate's cost, and the plan with local aggregate may be chose even
the query has {{rollup}} keyword.
we could add a rule based phase (after physical phase) to enforce local
aggregate if it's input has empty group.
was:
Many tpc-ds queries have {{rollup}} keyword, which will be translated to
multiple groups.
for example: {{group by rollup (channel, id)}} is equivalent {{ group by
(channel, id)}} + {{ group by (channel)}} + {{ group by () }}.
All data on empty group will be shuffled to a single node, It is a typical data
skew case. If there is a local aggregate, the data size shuffled to the single
node will be greatly reduced. However, currently the cost mode can't estimate
the local aggregate's cost, and the plan with local aggregate may be chose even
the query has {{rollup}} keyword.
we could add a rule based phase (after physical phase) to enforce local
aggregate if it's input has empty group.
> add local aggregate to solve data skew for ROLLUP/CUBE case
> -----------------------------------------------------------
>
> Key: FLINK-14874
> URL: https://issues.apache.org/jira/browse/FLINK-14874
> Project: Flink
> Issue Type: Sub-task
> Components: Table SQL / Planner
> Affects Versions: 1.9.1
> Reporter: godfrey he
> Priority: Major
> Fix For: 1.10.0
>
>
> Many tpc-ds queries have {{rollup}} keyword, which will be translated to
> multiple groups.
> for example: {{group by rollup (channel, id)}} is equivalent {{group by
> (channel, id)}} + {{group by (channel)}} + {{group by ()}}.
> All data on empty group will be shuffled to a single node, It is a typical
> data skew case. If there is a local aggregate, the data size shuffled to the
> single node will be greatly reduced. However, currently the cost mode can't
> estimate the local aggregate's cost, and the plan with local aggregate may be
> chose even the query has {{rollup}} keyword.
> we could add a rule based phase (after physical phase) to enforce local
> aggregate if it's input has empty group.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)