geoffreyclaude opened a new issue, #15559: URL: https://github.com/apache/datafusion/issues/15559
### Is your feature request related to a problem or challenge? Currently, the benchmarks folder in DataFusion does not include dedicated benchmarks for TopK queries (i.e., queries formatted as `SELECT ... ORDER BY a LIMIT n`). With ongoing work to optimize these types of queries, having dedicated benchmarks would be valuable for measuring progress. ### Describe the solution you'd like There are already sorting benchmarks based on the TPCH dataset. Since a TopK query is essentially a sort operation with an additional limit, we can extend the existing `sort_tpch` benchmarks by introducing an optional `LIMIT n` clause. This modification would effectively convert them into proper TopK benchmarks. ### Describe alternatives you've considered _No response_ ### Additional context Relevant recent issues: - https://github.com/apache/datafusion/issues/15037 - https://github.com/apache/datafusion/issues/15529 - https://github.com/apache/datafusion/issues/15538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org