andygrove commented on PR #19514: URL: https://github.com/apache/datafusion/pull/19514#issuecomment-3694848340
> Thanks @andygrove the numbers make huge sense to me, what prob concerns is DF using `make_scalar_function` in lots of places so the same problem can be relevant for others. > > Would you mind adding more details what is the exact reason and what scalar/array combinations are the most trouble? > > this info might be important how DF treats bultin functions now Basically, the `make_scalar_function` functions (there are three implementations, two of which are identical and one with slightly different behavior) convert scalar arguments into arrays. This adds unnecessary overhead in some cases because Arrow has specialized implementations of some kernels with fast paths for scalar arguments. For example, for `contains`, Arrow has `pub fn contains(left: &dyn Datum, right: &dyn Datum)` and `Datum` is implemented for arrays and scalars. The implementation has special handling for scalars. The three implementations of `make_scalar_function` can be found in these files: - `datafusion/functions/src/utils.rs` - `datafusion/functions-nested/src/utils.rs` - `datafusion/spark/src/function/functions_nested_utils.rs` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
