lsyldliu opened a new pull request, #22914:
URL: https://github.com/apache/flink/pull/22914

   ## What is the purpose of the change
   
   *For the HashAgg operator, planner currently prefers a one-phase agg when 
the statistic cannot be accurately estimated. In some queries of production 
scenarios, it may be more reasonable to choose a two-phase agg. In the TPC-DS 
cases, we find that for some patterns actually choosing two-stage agg, the 
query runtime is significantly reduced. In 
https://issues.apache.org/jira/browse/FLINK-30542 , we have introduced the 
adaptive local hashagg, which can adaptively skip aggregation when the local 
phase aggregation degree is relatively low, which can greatly improve the 
performance of two-phase aggregation in some queries. Based on the above 
background, in this issue, we propose to turn on two-phase agg by default for 
functions that support adaptive local hashagg, such as sum/count/min/max, etc., 
so as to exploit the ability of adpative local hashgg to improve the 
performance of agg query. For OFCG, if we turn on two-phaseagg by default, we 
can also let the local agg operator be put in
 to the fused operator, so as to enjoy the benefit from OFCG.*
   
   
   ## Brief change log
   
     - *Enable two-phase HashAgg default when all aggregate functions in query 
support adaptive local HashAgg*
   
   
   ## Verifying this change
   
   This change is already covered by existing tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to