alamb commented on issue #5230: URL: https://github.com/apache/arrow-datafusion/issues/5230#issuecomment-1438442727
I think having a small performance regression for small inputs is fine, for what it is worth. The challenge is going to be finding some query where code is actually sorting large amounts of data (most such queries will be using `LIMIT K` or something so don't need to sort the entire thing. I wonder if there are any benchmarks that show the effects of the change in https://github.com/apache/arrow-datafusion/tree/main/benchmarks Another thing we might be able to do is cook up some small benchmark that involves resorting one of the TPCH tables (to model, for example, resorting a parquet file for better speed or compression. I may be able to help this over the next week or so. I am traveling this week so my bandwidth is limited -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
