alamb opened a new issue, #13449:
URL: https://github.com/apache/datafusion/issues/13449

   ### Is your feature request related to a problem or challenge?
   
   While looking at the results of the most recent clickbench run
   - https://github.com/apache/datafusion/issues/13099
   
   I see there are a few queries where DataFusion is significantly slower
   ![Screenshot 2024-11-16 at 7 56 17 
AM](https://github.com/user-attachments/assets/1363c52d-16e1-415d-86d3-12d0446933ac)
   
   The queries are:
   
   Q18: 
https://github.com/apache/datafusion/blob/73507c307487708deb321e1ba4e0d302084ca27e/benchmarks/queries/clickbench/queries.sql#L19
   
   Q35: 
https://github.com/apache/datafusion/blob/73507c307487708deb321e1ba4e0d302084ca27e/benchmarks/queries/clickbench/queries.sql#L36
   
   
   ### Describe the solution you'd like
   
   I would like the queries to go faster
   
   ### Describe alternatives you've considered
   
   Both queries look like  
   ```sql
   SELECT COUNT(...) cnt ... ORDER BY cnt DESC LIMIT 10
   ```
   
   In other words they are "top 10 count" style queries
   
   By default, DataFusion will  compute the counts for all groups, and then 
pick only the top 10. 
   
   I suspect there is some fancier way to do this, perhaps by finding the top 
10 values of count when emitting from the group operator or something. It would 
be interesting to see if we can see what other engines like DuckDB do with this 
query 
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to