[GitHub] [arrow-datafusion] yjshen commented on pull request #2146: Buffer records in row format in memory for SortExec

GitBox Mon, 04 Apr 2022 21:44:18 -0700


yjshen commented on PR #2146:
URL: 
https://github.com/apache/arrow-datafusion/pull/2146#issuecomment-1088263158


   Thanks @alamb! This PR acts more like a playground and testbed for the row 
format usage in the sort's payload. It's not mature enough but I've got some 
numbers we could discuss. 
   
   As revealed by the benchmark above, row <-> columnar batch comes with the 
price of more extra CPU computations, and the cost pays off when columnar 
memory access is more expensive while dealing with a bigger dataset.
   
   Obviously, we need a clever/adaptive mechanism to choose which memory layout 
we should employ regarding `the number of payload columns`, size of input data, 
etc. So I'm posting the experimental implementation with some micro bench 
results to gain insights from more brilliant minds.
   
   cc @Dandandan @houqp @tustvold you might be interested in this as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on pull request #2146: Buffer records in row format in memory for SortExec

Reply via email to