Re: [PR] perf: Optimize `contains` for scalar search arg [datafusion]

via GitHub Sun, 28 Dec 2025 07:59:19 -0800


andygrove commented on PR #19514:
URL: https://github.com/apache/datafusion/pull/19514#issuecomment-3694848340


   > Thanks @andygrove the numbers make huge sense to me, what prob concerns is 
DF using `make_scalar_function` in lots of places so the same problem can be 
relevant for others.
   > 
   > Would you mind adding more details what is the exact reason and what 
scalar/array combinations are the most trouble?
   > 
   > this info might be important how DF treats bultin functions now
   
   Basically, the `make_scalar_function` functions (there are three 
implementations, two of which are identical and one with slightly different 
behavior) convert scalar arguments into arrays. This adds unnecessary overhead 
in some cases because Arrow has specialized implementations of some kernels 
with fast paths for scalar arguments. For example, for `contains`, Arrow has 
`pub fn contains(left: &dyn Datum, right: &dyn Datum)` and `Datum` is 
implemented for arrays and scalars. The implementation has special handling for 
scalars.
   
   The three implementations of `make_scalar_function` can be found in these 
files:
   
   - `datafusion/functions/src/utils.rs`
   - `datafusion/functions-nested/src/utils.rs`
   - `datafusion/spark/src/function/functions_nested_utils.rs`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: Optimize `contains` for scalar search arg [datafusion]

Reply via email to