Rachelint commented on issue #20773: URL: https://github.com/apache/datafusion/issues/20773#issuecomment-4017032783
As I see, one of the most obviously bottleneck of aggregation in datafuison is `RepartitionExec`. The partitioned hashtable approach maybe can help, I did an experiment about it before: https://github.com/apache/datafusion/pull/12526 The performance improvement got from `removing RepartitionExec`. But taking consider with `skip partial aggregations`, I think it not the good and general way to solve the performance problem of `RepartitionExec`. I think #15383 may be the better approach about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
