lsyldliu opened a new pull request, #22914: URL: https://github.com/apache/flink/pull/22914
## What is the purpose of the change *For the HashAgg operator, planner currently prefers a one-phase agg when the statistic cannot be accurately estimated. In some queries of production scenarios, it may be more reasonable to choose a two-phase agg. In the TPC-DS cases, we find that for some patterns actually choosing two-stage agg, the query runtime is significantly reduced. In https://issues.apache.org/jira/browse/FLINK-30542 , we have introduced the adaptive local hashagg, which can adaptively skip aggregation when the local phase aggregation degree is relatively low, which can greatly improve the performance of two-phase aggregation in some queries. Based on the above background, in this issue, we propose to turn on two-phase agg by default for functions that support adaptive local hashagg, such as sum/count/min/max, etc., so as to exploit the ability of adpative local hashgg to improve the performance of agg query. For OFCG, if we turn on two-phaseagg by default, we can also let the local agg operator be put in to the fused operator, so as to enjoy the benefit from OFCG.* ## Brief change log - *Enable two-phase HashAgg default when all aggregate functions in query support adaptive local HashAgg* ## Verifying this change This change is already covered by existing tests. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not documented) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
