GaneshPatil7517 opened a new pull request, #19765: URL: https://github.com/apache/datafusion/pull/19765
Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/19511 Related to https://github.com/apache/datafusion/issues/18882 Rationale for this change Currently, AggregateUDFImpl::is_nullable() returns true by default for all UDAFs, regardless of input characteristics. This is not ideal because: The same nullability information is already encoded in return_field() Most aggregate functions should only be nullable if their inputs are nullable (e.g., MIN, MAX, SUM) This pattern doesn't align with scalar UDFs, which already use return_field_from_args() for nullability What changes are included in this PR? Core Changes Deprecated is_nullable() on AggregateUDFImpl trait with migration guidance Updated udaf_default_return_field() to compute nullability from input fields: Output is nullable if ANY input field is nullable Output is non-nullable only if ALL inputs are non-nullable Tests Added 4 new tests validating nullability inference: test_return_field_nullability_from_nullable_input test_return_field_nullability_from_non_nullable_input test_return_field_nullability_with_mixed_inputs test_return_field_preserves_return_type Documentation New docs/source/library-user-guide/functions/udf-nullability.md with migration guide and examples Updated adding-udfs.md with reference to nullability documentation Are these changes tested? Yes. All existing tests pass, plus 4 new tests specifically for nullability behavior. Are there any user-facing changes? Deprecation warning: Users implementing is_nullable() will see a deprecation warning directing them to use return_field() instead. Behavioral change: Default nullability now depends on input field nullability rather than always returning true. Functions like COUNT that need to always return non-nullable should override return_field(). This is a potentially breaking change for users who rely on the previous behavior of always-nullable outputs, but the new behavior is more correct and aligns with scalar UDF patterns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
