alamb opened a new issue, #13449: URL: https://github.com/apache/datafusion/issues/13449
### Is your feature request related to a problem or challenge? While looking at the results of the most recent clickbench run - https://github.com/apache/datafusion/issues/13099 I see there are a few queries where DataFusion is significantly slower  The queries are: Q18: https://github.com/apache/datafusion/blob/73507c307487708deb321e1ba4e0d302084ca27e/benchmarks/queries/clickbench/queries.sql#L19 Q35: https://github.com/apache/datafusion/blob/73507c307487708deb321e1ba4e0d302084ca27e/benchmarks/queries/clickbench/queries.sql#L36 ### Describe the solution you'd like I would like the queries to go faster ### Describe alternatives you've considered Both queries look like ```sql SELECT COUNT(...) cnt ... ORDER BY cnt DESC LIMIT 10 ``` In other words they are "top 10 count" style queries By default, DataFusion will compute the counts for all groups, and then pick only the top 10. I suspect there is some fancier way to do this, perhaps by finding the top 10 values of count when emitting from the group operator or something. It would be interesting to see if we can see what other engines like DuckDB do with this query ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
