alamb commented on code in PR #10193: URL: https://github.com/apache/datafusion/pull/10193#discussion_r1576665672
########## datafusion/core/tests/user_defined/user_defined_scalar_functions.rs: ########## @@ -615,6 +493,22 @@ async fn test_user_defined_functions_cast_to_i64() -> Result<()> { Ok(()) } +#[tokio::test] +async fn deregister_udf() -> Result<()> { Review Comment: 👍 ########## datafusion/expr/src/udf.rs: ########## @@ -322,11 +322,6 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { /// The function will be invoked passed with the slice of [`ColumnarValue`] /// (either scalar or array). /// - /// # Zero Argument Functions - /// If the function has zero parameters (e.g. `now()`) it will be passed a - /// single element slice which is a a null array to indicate the batch's row - /// count (so the function can know the resulting array size). - /// Review Comment: Can we please leave a note about what is required to implement `Zero Argument Functions`? I think the expectation is that the output is a single `ColumnarValue::Scalar`, rather than an Array ########## datafusion/functions/src/math/random.rs: ########## @@ -64,45 +61,9 @@ impl ScalarUDFImpl for RandomFunc { Ok(Float64) } - fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> { - random(args) - } -} - -/// Random SQL function -fn random(args: &[ColumnarValue]) -> Result<ColumnarValue> { - let len: usize = match &args[0] { - ColumnarValue::Array(array) => array.len(), - _ => return exec_err!("Expect random function to take no param"), - }; - let mut rng = thread_rng(); - let values = iter::repeat_with(|| rng.gen_range(0.0..1.0)).take(len); - let array = Float64Array::from_iter_values(values); - Ok(ColumnarValue::Array(Arc::new(array))) -} - -#[cfg(test)] -mod test { - use std::sync::Arc; - - use arrow::array::NullArray; - - use datafusion_common::cast::as_float64_array; - use datafusion_expr::ColumnarValue; - - use crate::math::random::random; - - #[test] - fn test_random_expression() { - let args = vec![ColumnarValue::Array(Arc::new(NullArray::new(1)))]; - let array = random(&args) - .expect("failed to initialize function random") - .into_array(1) - .expect("Failed to convert to array"); - let floats = - as_float64_array(&array).expect("failed to initialize function random"); - - assert_eq!(floats.len(), 1); - assert!(0.0 <= floats.value(0) && floats.value(0) < 1.0); + fn invoke(&self, _args: &[ColumnarValue]) -> Result<ColumnarValue> { Review Comment: I think the idea here is that expectation is that `rand` is invoked *once per row* rather than *once per batch*. And the only way it knew how many rows to make is to get a null array in 🤔 For example, when I run `datafusion-cli` from this PR to call random() the same value is returned for each row: ```sql > create table foo as values (1), (2), (3), (4), (5); 0 row(s) fetched. Elapsed 0.018 seconds. > select column1, random() from foo; +---------+--------------------+ | column1 | random() | +---------+--------------------+ | 1 | 0.9594375709000513 | | 2 | 0.9594375709000513 | | 3 | 0.9594375709000513 | | 4 | 0.9594375709000513 | | 5 | 0.9594375709000513 | +---------+--------------------+ 5 row(s) fetched. Elapsed 0.012 seconds. ``` But I expect that each row has a different value for `random()` However, since none of the tests failed, clearly we have a gap in test coverage 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org