jkosh44 commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2640206878
@jayzhan211 I'm sorry to draw this out more, but I've been thinking about this change recently and now that I understand the code a little better, I think we should remove `NullHandling`. Here's why. `NullHandling` only handles inputs when they are `ColumnarValue::Scalar(_)` and does not handle inputs that are `ColumnarValue::Array(_)`. So function implementers have to account for null `ColumnarValue::Array(_)` values in their implementation. Whatever the implementer does to account for `ColumnarValue::Array(_)` _should_ also work for `ColumnarValue::Scalar(_)`. The only reason that is doesn't currently work for many of the functions, is that their signatures are `signature: Signature::variadic_any(Volatility::Immutable),`. This causes null scalar inputs to be passed in as `ColumnarValue::Scalar(ScalarValue::Null)` instead of the null value of the expected type (for example `ColumnarValue::Scalar(ScalarValue::Int32(None))`). When the implementation tries to treat `ColumnarValue::Scalar(ScalarValue::Null)` as the expected type either an error is returned or a panic happens. However, if we update a functions signature to actually describe the expected types, then a null scalar input is passed in as the null value for that type. Furthermore, we can't add `NullHandling::Propagate` to a function without updating it's signature or else we risk accepting invalid argument types when one of the arguments are null. So, what ends up happening is that `NullHandling::Propagate` is always a no-op and ends up not being useful, or it compensates for a non-descriptive signature. In fact if we apply the following diff, which comments out `NullHandling::Propagate` from `array_slice`, all of the tests still pass. (Note: I also realized that I accidentally added a 2 argument variant to `array_slice` which isn't valid so the diff includes fixing that). ``` diff --git a/datafusion/functions-nested/src/extract.rs b/datafusion/functions-nested/src/extract.rs index 8c120876c..32af568ee 100644 --- a/datafusion/functions-nested/src/extract.rs +++ b/datafusion/functions-nested/src/extract.rs @@ -329,11 +329,6 @@ impl ArraySlice { Self { signature: Signature::one_of( vec![ - TypeSignature::ArraySignature( - ArrayFunctionSignature::ArrayAndIndexes( - NonZeroUsize::new(1).expect("1 is non-zero"), - ), - ), TypeSignature::ArraySignature( ArrayFunctionSignature::ArrayAndIndexes( NonZeroUsize::new(2).expect("2 is non-zero"), @@ -390,9 +385,9 @@ impl ScalarUDFImpl for ArraySlice { Ok(arg_types[0].clone()) } - fn null_handling(&self) -> NullHandling { - NullHandling::Propagate - } + // fn null_handling(&self) -> NullHandling { + // NullHandling::Propagate + // } fn invoke_batch( &self, diff --git a/datafusion/sqllogictest/test_files/array.slt b/datafusion/sqllogictest/test_files/array.slt index d804bb424..fc80ec79d 100644 --- a/datafusion/sqllogictest/test_files/array.slt +++ b/datafusion/sqllogictest/test_files/array.slt @@ -1850,15 +1850,11 @@ select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), 0, [] [] # array_slice scalar function #11 (with NULL-NULL) -query ?? +query error select array_slice(make_array(1, 2, 3, 4, 5), NULL), array_slice(make_array('h', 'e', 'l', 'l', 'o'), NULL); ----- -NULL NULL -query ?? +query error select array_slice(arrow_cast(make_array(1, 2, 3, 4, 5), 'LargeList(Int64)'), NULL), array_slice(arrow_cast(make_array('h', 'e', 'l', 'l', 'o'), 'LargeList(Utf8)'), NULL); ----- -NULL NULL # array_slice scalar function #12 (with zero and negative number) query ?? ``` What do you think? If you agree then I'd like to get a fix in before the next release is cut, whenever that is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org