[
https://issues.apache.org/jira/browse/FLINK-32281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dalongliu updated FLINK-32281:
------------------------------
Description: For the HashAgg operator, planner currently prefers a
one-phase agg when the statistic cannot be accurately estimated. In some
queries of production scenarios, it may be more reasonable to choose a
two-phase agg. In the TPC-DS cases, we find that for some patterns actually
choosing two-stage agg, the query runtime is significantly reduced. In
https://issues.apache.org/jira/browse/FLINK-30542 , we have introduced the
adaptive local hashagg, which can adaptively skip aggregation when the local
phase aggregation degree is relatively low, which can greatly improve the
performance of two-phase aggregation in some queries. Based on the above
background, in this issue, we propose to turn on two-phase agg by default for
functions that support adaptive local hashagg, such as sum/count/min/max, etc.,
so as to exploit the ability of adpative local hashgg to improve the
performance of agg query. For OFCG, if we turn on two-phaseagg by default, we
can also let the local agg operator be put into the fused operator, so as to
enjoy the benefit from OFCG.
> Enable two-phase HashAgg default when agg function support adaptive local
> HashAgg
> ---------------------------------------------------------------------------------
>
> Key: FLINK-32281
> URL: https://issues.apache.org/jira/browse/FLINK-32281
> Project: Flink
> Issue Type: Sub-task
> Components: Table SQL / Planner
> Reporter: dalongliu
> Priority: Major
>
> For the HashAgg operator, planner currently prefers a one-phase agg when the
> statistic cannot be accurately estimated. In some queries of production
> scenarios, it may be more reasonable to choose a two-phase agg. In the TPC-DS
> cases, we find that for some patterns actually choosing two-stage agg, the
> query runtime is significantly reduced. In
> https://issues.apache.org/jira/browse/FLINK-30542 , we have introduced the
> adaptive local hashagg, which can adaptively skip aggregation when the local
> phase aggregation degree is relatively low, which can greatly improve the
> performance of two-phase aggregation in some queries. Based on the above
> background, in this issue, we propose to turn on two-phase agg by default for
> functions that support adaptive local hashagg, such as sum/count/min/max,
> etc., so as to exploit the ability of adpative local hashgg to improve the
> performance of agg query. For OFCG, if we turn on two-phaseagg by default, we
> can also let the local agg operator be put into the fused operator, so as to
> enjoy the benefit from OFCG.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)