alamb commented on code in PR #8878:
URL: https://github.com/apache/arrow-datafusion/pull/8878#discussion_r1453295369
##########
datafusion-examples/examples/simple_udf.rs:
##########
@@ -61,17 +62,24 @@ async fn main() -> Result<()> {
let ctx = create_context()?;
// First, declare the actual implementation of the calculation
- let pow = |args: &[ArrayRef]| {
+ let pow = Arc::new(|args: &[ColumnarValue]| {
// in DataFusion, all `args` and output are dynamically-typed arrays,
which means that we need to:
// 1. cast the values to the type we want
// 2. perform the computation for every element in the array (using a
loop or SIMD) and construct the result
// this is guaranteed by DataFusion based on the function's signature.
assert_eq!(args.len(), 2);
+ let ColumnarValue::Array(arg0) = &args[0] else {
Review Comment:
I think the more correct pattern is to create an array with
[`into_array`](https://docs.rs/datafusion/latest/datafusion/physical_plan/enum.ColumnarValue.html#method.into_array)
```suggestion
let arg0 = args[0].into_array(num_rows)
```
Though now I wrote that it is not super clear to me where the num_rows
should come from 🤔
Also, maybe this is more of the fix for
https://github.com/apache/arrow-datafusion/issues/8866 -- that for a volatile
function, it shouldn't be passed a scalar but unstead the array should be
expanded prior to invocation
##########
datafusion/proto/tests/cases/roundtrip_logical_plan.rs:
##########
@@ -1592,9 +1591,12 @@ fn roundtrip_aggregate_udf() {
#[test]
fn roundtrip_scalar_udf() {
- let fn_impl = |args: &[ArrayRef]| Ok(Arc::new(args[0].clone()) as
ArrayRef);
-
- let scalar_fn = make_scalar_function(fn_impl);
+ let scalar_fn = Arc::new(|args: &[ColumnarValue]| {
+ let ColumnarValue::Array(array) = &args[0] else {
+ panic!()
+ };
+ Ok(ColumnarValue::Array(Arc::new(array.clone()) as ArrayRef))
Review Comment:
I wonder if we could make this code easier to work with using `From` impls,
like
```suggestion
Ok(ColumnarValue::from(Arc::new(array.clone()) as ArrayRef))
```
Maybe it doesn't matter 🤔
##########
datafusion/physical-expr/src/functions.rs:
##########
@@ -191,9 +191,11 @@ pub(crate) enum Hint {
AcceptsSingular,
}
-/// decorates a function to handle [`ScalarValue`]s by converting them to
arrays before calling the function
+/// Decorates a function to handle [`ScalarValue`]s by converting them to
arrays before calling the function
/// and vice-versa after evaluation.
-pub fn make_scalar_function<F>(inner: F) -> ScalarFunctionImplementation
+/// Note that this function makes a scalar function with no arguments or all
scalar inputs return a scalar.
+/// That's said its output will be same for all input rows in a batch.
+pub(crate) fn make_scalar_function<F>(inner: F) -> ScalarFunctionImplementation
Review Comment:
For better or worse, since this function is in the `example_udf.rs` file, I
think it is likely to be widely used when people followed the example model.
Thus I think this change is likely to be disruptive to exising users.
To help soften the blow, I agree that deprecating it for a release or two
would be the best
Perhaps we could do something like
```suggestion
#[deprecated(since = "35.0.0", note = "Implement your function directly in
terms of ColumnarValue or use `ScalarUDF` instead")]
pub fn make_scalar_function<F>(inner: F) -> ScalarFunctionImplementation
make_scaler_function_inner(inner)
}
/// Internal implementation, see comments on `make_scalar_function` for
caveats
pub(crate) fn make_scalar_function_inner<F>(inner: F) ->
ScalarFunctionImplementation
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]