alamb opened a new issue, #7196: URL: https://github.com/apache/arrow-datafusion/issues/7196
### Is your feature request related to a problem or challenge? This pattern is common: ``` SELECT c1, c2 FROM t ORDER BY c3 LIMIT 10 ``` For example we have queries in IOx like the following (this is the same pattern @NGA-TRAN describes on https://github.com/apache/arrow-datafusion/issues/7162) ``` SELECT tag, value1, ... FROM t WHERE other_column = 'foo' ORDER BY time LIMIT 10 ``` ### Describe the solution you'd like If the data *IS NOT* already sorted, what happens today is a plan like ``` LIMIT(fetch=10) SORT(sort_exprs=[c3] fetch=10) SCAN(...) ``` And the Sort can take partial advantage of the fetch -- and it will be better after @gruuya 's change in https://github.com/apache/arrow-datafusion/pull/7180 We can probably do better still with a special operator like the following that uses some specialized structure (perhaps some type of heap) ``` TOPK(fetch=10, sort_exprs=[c3]) SCAN(...) ``` ### Describe alternatives you've considered If the data is already sorted the right way, DataFusion can just read first N values and stop as described on https://github.com/apache/arrow-datafusion/issues/7162 ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
