yjshen commented on PR #2146: URL: https://github.com/apache/arrow-datafusion/pull/2146#issuecomment-1088263158
Thanks @alamb! This PR acts more like a playground and testbed for the row format usage in the sort's payload. It's not mature enough but I've got some numbers we could discuss. As revealed by the benchmark above, row <-> columnar batch comes with the price of more extra CPU computations, and the cost pays off when columnar memory access is more expensive while dealing with a bigger dataset. Obviously, we need a clever/adaptive mechanism to choose which memory layout we should employ regarding `the number of payload columns`, size of input data, etc. So I'm posting the experimental implementation with some micro bench results to gain insights from more brilliant minds. cc @Dandandan @houqp @tustvold you might be interested in this as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
