[GitHub] [arrow-datafusion] alamb commented on issue #5230: Use Arrow Row Format in SortExec

via GitHub Tue, 21 Feb 2023 05:00:31 -0800


alamb commented on issue #5230:
URL: 
https://github.com/apache/arrow-datafusion/issues/5230#issuecomment-1438442727


   I think having a small performance regression for small inputs is fine, for 
what it is worth. The challenge is going to be finding some query where code is 
actually sorting large amounts of data (most such queries will be using `LIMIT 
K` or something so don't need to sort the entire thing.
   
   I wonder if there are any benchmarks that show the effects of the change in  
https://github.com/apache/arrow-datafusion/tree/main/benchmarks
   
   Another thing we might be able to do is cook up some small benchmark that 
involves resorting one of the TPCH tables (to model, for example, resorting a 
parquet file for better speed or compression. I may be able to help this over 
the next week or so. I am traveling this week so my bandwidth is limited


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #5230: Use Arrow Row Format in SortExec

Reply via email to