GaneshPatil7517 opened a new pull request, #19765:
URL: https://github.com/apache/datafusion/pull/19765

   Which issue does this PR close?
   Closes https://github.com/apache/datafusion/issues/19511
   Related to https://github.com/apache/datafusion/issues/18882
   
   Rationale for this change
   Currently, AggregateUDFImpl::is_nullable() returns true by default for all 
UDAFs, regardless of input characteristics. This is not ideal because:
   
   The same nullability information is already encoded in return_field()
   Most aggregate functions should only be nullable if their inputs are 
nullable (e.g., MIN, MAX, SUM)
   This pattern doesn't align with scalar UDFs, which already use 
return_field_from_args() for nullability
   What changes are included in this PR?
   Core Changes
   Deprecated is_nullable() on AggregateUDFImpl trait with migration guidance
   Updated udaf_default_return_field() to compute nullability from input fields:
   Output is nullable if ANY input field is nullable
   Output is non-nullable only if ALL inputs are non-nullable
   Tests
   Added 4 new tests validating nullability inference:
   
   test_return_field_nullability_from_nullable_input
   test_return_field_nullability_from_non_nullable_input
   test_return_field_nullability_with_mixed_inputs
   test_return_field_preserves_return_type
   Documentation
   New docs/source/library-user-guide/functions/udf-nullability.md with 
migration guide and examples
   Updated adding-udfs.md with reference to nullability documentation
   Are these changes tested?
   Yes. All existing tests pass, plus 4 new tests specifically for nullability 
behavior.
   
   Are there any user-facing changes?
   Deprecation warning: Users implementing is_nullable() will see a deprecation 
warning directing them to use return_field() instead.
   
   Behavioral change: Default nullability now depends on input field 
nullability rather than always returning true. Functions like COUNT that need 
to always return non-nullable should override return_field().
   
   This is a potentially breaking change for users who rely on the previous 
behavior of always-nullable outputs, but the new behavior is more correct and 
aligns with scalar UDF patterns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to