Rafferty97 commented on issue #20585: URL: https://github.com/apache/datafusion/issues/20585#issuecomment-3987760466
I have a concern I think is worth discussing before we commit to this approach. Specifically, I think that UDFs should be free to return whatever physical type is cheapest for them to produce, and to restrain them to mirroring their input type(s) would unnecessarily cap query performance. For example: - Functions like `trim` and `substr` are just views into the underlying data, so returning `Utf8View` is a clear performance win even if the input types aren't string views - Conversly, functions like `reverse` always allocate new data buffers, so it is more space efficient to return `Utf8` or `LargeUtf8` rather than `Utf8View`, as their offset buffers take up 4/8 bytes per element as opposed to `Utf8View`'s 16 bytes. I understand there's an argument to be made about consistency, but the logical/physical planners appear to already be capable of inserting casts where needed to ensure these types can mix well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
