[GitHub] [arrow-datafusion] ozankabak commented on issue #5230: Use Arrow Row Format in SortExec

via GitHub Thu, 02 Mar 2023 17:04:35 -0800


ozankabak commented on issue #5230:
URL: 
https://github.com/apache/arrow-datafusion/issues/5230#issuecomment-1452788537


   @tustvold, can you advise us on how to use the row conversion facility most 
efficiently? It seems both @jaylmiller and us are seeing the same behavior and 
I'd like to make sure we are using the tools at our disposal the right way. In 
summary, as batch sizes get smaller, row conversion seems to result in lower 
overall performance (probably due to gains not justifying the conversion cost 
for small batch sizes). If you can take a look at how @jaylmiller is using the 
facility and let us know whether it is used appropriately it'd be great.
   
   If everything is done right yet this behavior still persists, maybe we can 
then think about how to identify a reasonable default crossover point (which 
could be overridden via config) and use different approaches for different 
batch sizes. I don't like this kind of "impure" approaches in general, but 
sometimes they yield great performance. The history of sort algorithms are full 
of such "hacks", so maybe this is one of the places where it makes sense to do 
it 🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] ozankabak commented on issue #5230: Use Arrow Row Format in SortExec

Reply via email to