Re: [PR] perf: Improve hash agg performance in large-scale data situations [cloudberry]

via GitHub Fri, 20 Jun 2025 08:39:36 -0700


jianlirong commented on PR #1178:
URL: https://github.com/apache/cloudberry/pull/1178#issuecomment-2992092970


   What I want to express is that for Streaming Partial HashAggregate, the data 
structure NumericAggState for the intermediate state in the first stage 
aggregation doesn't need to be the same as the state data structure for the 
final stage aggregation. For the final stage aggregation's state data 
structure, it should indeed maintain consistency with PG, and the two fields 
mentioned above could indeed exceed the maximum value of int32. However, for 
Streaming Partial HashAggregate, we can do some special handling: when the 
above two fields reach the maximum value of int32, we can stream out the 
results of the Partial HashAggregate, and then reconstruct a new Partial 
HashAggregate result starting from zero. Considering that int32 should be very 
large, even in extreme cases where the number of tuples is very high, the 
overall execution performance won't actually decline, because in such cases, 
the cost that NumericAggState brings to motion is almost negligible compared to 
scanning and pe
 rforming aggregation operations on more than 2^32 - 1 tuples.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@cloudberry.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cloudberry.apache.org
For additional commands, e-mail: commits-h...@cloudberry.apache.org

Re: [PR] perf: Improve hash agg performance in large-scale data situations [cloudberry]

Reply via email to