ozankabak commented on issue #5230: URL: https://github.com/apache/arrow-datafusion/issues/5230#issuecomment-1452788537
@tustvold, can you advise us on how to use the row conversion facility most efficiently? It seems both @jaylmiller and us are seeing the same behavior and I'd like to make sure we are using the tools at our disposal the right way. In summary, as batch sizes get smaller, row conversion seems to result in lower overall performance (probably due to gains not justifying the conversion cost for small batch sizes). If you can take a look at how @jaylmiller is using the facility and let us know whether it is used appropriately it'd be great. If everything is done right yet this behavior still persists, maybe we can then think about how to identify a reasonable default crossover point (which could be overridden via config) and use different approaches for different batch sizes. I don't like this kind of "impure" approaches in general, but sometimes they yield great performance. The history of sort algorithms are full of such "hacks", so maybe this is one of the places where it makes sense to do it 🙂 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
