yjshen commented on issue #12596: URL: https://github.com/apache/datafusion/issues/12596#issuecomment-2372599393
> Introduce the partitioned hashtable in partial aggregation, and we partition the datafusion before inserting them into hashtable. > And we push them into final aggregation partition by partition after, rather than split them again in repartition, and merge them again in coalesce. I'm not clear on how this proposal works. Could you please explain why it provides performance benefits compared to partial aggregation, exchange, and final aggregation? Is the proposal aimed explicitly at accelerating high cardinality aggregation, or is it intended to enhance aggregation performance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org