This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new 8ef884d3c4 Change default of 
`AggregateUDFImpl::supports_null_handling_clause` to `false` (#18441)
8ef884d3c4 is described below

commit 8ef884d3c4cbb5e0c961ab307d1d16aea6836f0c
Author: Jeffrey Vo <[email protected]>
AuthorDate: Thu Nov 13 19:34:11 2025 +1100

    Change default of `AggregateUDFImpl::supports_null_handling_clause` to 
`false` (#18441)
    
    ## Which issue does this PR close?
    
    <!--
    We generally require a GitHub issue to be filed for all bug fixes and
    enhancements and this helps us generate change logs for our releases.
    You can link an issue to this PR using the GitHub syntax. For example
    `Closes #123` indicates that this PR will close issue #123.
    -->
    
    - Part of #9924
    
    ## Rationale for this change
    
    <!--
    Why are you proposing this change? If this is already explained clearly
    in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand
    your changes and offer better suggestions for fixes.
    -->
    
    Setting a default of `true` is too permissive; we can see it allows
    specifying it on `median` for example even though that function doesn't
    take the config into account. It seems only `array_agg`, `first_value`
    and `last_value` actually respect the config this setting handles so it
    makes more sense to make this false by default.
    
    ## What changes are included in this PR?
    
    <!--
    There is no need to duplicate the description in the issue here but it
    is sometimes worth providing a summary of the individual changes in this
    PR.
    -->
    
    Change default of `AggregateUDFImpl::supports_null_handling_clause` to
    `false` (from `true`). Adjust `array_agg`, `first_value` and
    `last_value` to implement it as `true`.
    
    ## Are these changes tested?
    
    <!--
    We typically require tests for all PRs in order to:
    1. Prevent the code from being accidentally broken by subsequent changes
    2. Serve as another way to document the expected behavior of the code
    
    If tests are not included in your PR, please explain why (for example,
    are they covered by existing tests)?
    -->
    
    Existing tests, also added a negative case for `median` (expect SQL
    parsing to fail if providing null handling clause).
    
    ## Are there any user-facing changes?
    
    <!--
    If there are user-facing changes then we may require documentation to be
    updated before approving the PR.
    -->
    
    Behaviour change as default of a trait method is changing. Added section
    to upgrade guide.
    
    <!--
    If there are any breaking changes to public APIs, please add the `api
    change` label.
    -->
---
 datafusion/expr/src/udaf.rs                        | 10 +++++---
 .../src/approx_percentile_cont.rs                  |  4 ---
 .../src/approx_percentile_cont_with_weight.rs      |  4 ---
 datafusion/functions-aggregate/src/array_agg.rs    |  4 +++
 datafusion/functions-aggregate/src/first_last.rs   |  8 ++++++
 .../functions-aggregate/src/percentile_cont.rs     |  4 ---
 datafusion/sqllogictest/test_files/aggregate.slt   |  7 ++++++
 docs/source/library-user-guide/upgrading.md        | 29 +++++++++++++++++++---
 8 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/datafusion/expr/src/udaf.rs b/datafusion/expr/src/udaf.rs
index 42a5f9b262..e2ae697dee 100644
--- a/datafusion/expr/src/udaf.rs
+++ b/datafusion/expr/src/udaf.rs
@@ -740,10 +740,14 @@ pub trait AggregateUDFImpl: Debug + DynEq + DynHash + 
Send + Sync {
         ScalarValue::try_from(data_type)
     }
 
-    /// If this function supports `[IGNORE NULLS | RESPECT NULLS]` clause, 
return true
-    /// If the function does not, return false
+    /// If this function supports `[IGNORE NULLS | RESPECT NULLS]` SQL clause,
+    /// return `true`. Otherwise, return `false` which will cause an error to 
be
+    /// raised during SQL parsing if these clauses are detected for this 
function.
+    ///
+    /// Functions which implement this as `true` are expected to handle the 
resulting
+    /// null handling config present in [`AccumulatorArgs`], `ignore_nulls`.
     fn supports_null_handling_clause(&self) -> bool {
-        true
+        false
     }
 
     /// If this function supports the `WITHIN GROUP (ORDER BY column 
[ASC|DESC])`
diff --git a/datafusion/functions-aggregate/src/approx_percentile_cont.rs 
b/datafusion/functions-aggregate/src/approx_percentile_cont.rs
index 4015abc6ad..6a09d159f6 100644
--- a/datafusion/functions-aggregate/src/approx_percentile_cont.rs
+++ b/datafusion/functions-aggregate/src/approx_percentile_cont.rs
@@ -315,10 +315,6 @@ impl AggregateUDFImpl for ApproxPercentileCont {
         Ok(arg_types[0].clone())
     }
 
-    fn supports_null_handling_clause(&self) -> bool {
-        false
-    }
-
     fn supports_within_group_clause(&self) -> bool {
         true
     }
diff --git 
a/datafusion/functions-aggregate/src/approx_percentile_cont_with_weight.rs 
b/datafusion/functions-aggregate/src/approx_percentile_cont_with_weight.rs
index 51891ce7f2..42aba0b95e 100644
--- a/datafusion/functions-aggregate/src/approx_percentile_cont_with_weight.rs
+++ b/datafusion/functions-aggregate/src/approx_percentile_cont_with_weight.rs
@@ -258,10 +258,6 @@ impl AggregateUDFImpl for ApproxPercentileContWithWeight {
         self.approx_percentile_cont.state_fields(args)
     }
 
-    fn supports_null_handling_clause(&self) -> bool {
-        false
-    }
-
     fn supports_within_group_clause(&self) -> bool {
         true
     }
diff --git a/datafusion/functions-aggregate/src/array_agg.rs 
b/datafusion/functions-aggregate/src/array_agg.rs
index b830588d40..c4574bf042 100644
--- a/datafusion/functions-aggregate/src/array_agg.rs
+++ b/datafusion/functions-aggregate/src/array_agg.rs
@@ -224,6 +224,10 @@ impl AggregateUDFImpl for ArrayAgg {
         datafusion_expr::ReversedUDAF::Reversed(array_agg_udaf())
     }
 
+    fn supports_null_handling_clause(&self) -> bool {
+        true
+    }
+
     fn documentation(&self) -> Option<&Documentation> {
         self.doc()
     }
diff --git a/datafusion/functions-aggregate/src/first_last.rs 
b/datafusion/functions-aggregate/src/first_last.rs
index 73f2ec112f..d2233fd932 100644
--- a/datafusion/functions-aggregate/src/first_last.rs
+++ b/datafusion/functions-aggregate/src/first_last.rs
@@ -302,6 +302,10 @@ impl AggregateUDFImpl for FirstValue {
         ReversedUDAF::Reversed(last_value_udaf())
     }
 
+    fn supports_null_handling_clause(&self) -> bool {
+        true
+    }
+
     fn documentation(&self) -> Option<&Documentation> {
         self.doc()
     }
@@ -1127,6 +1131,10 @@ impl AggregateUDFImpl for LastValue {
         ReversedUDAF::Reversed(first_value_udaf())
     }
 
+    fn supports_null_handling_clause(&self) -> bool {
+        true
+    }
+
     fn documentation(&self) -> Option<&Documentation> {
         self.doc()
     }
diff --git a/datafusion/functions-aggregate/src/percentile_cont.rs 
b/datafusion/functions-aggregate/src/percentile_cont.rs
index 1e06461e56..9ee788a6dd 100644
--- a/datafusion/functions-aggregate/src/percentile_cont.rs
+++ b/datafusion/functions-aggregate/src/percentile_cont.rs
@@ -358,10 +358,6 @@ impl AggregateUDFImpl for PercentileCont {
         }
     }
 
-    fn supports_null_handling_clause(&self) -> bool {
-        false
-    }
-
     fn supports_within_group_clause(&self) -> bool {
         true
     }
diff --git a/datafusion/sqllogictest/test_files/aggregate.slt 
b/datafusion/sqllogictest/test_files/aggregate.slt
index a593f66c7d..132922e42d 100644
--- a/datafusion/sqllogictest/test_files/aggregate.slt
+++ b/datafusion/sqllogictest/test_files/aggregate.slt
@@ -844,6 +844,13 @@ SELECT approx_median(distinct col_i8) FROM median_table
 statement error DataFusion error: This feature is not implemented: 
APPROX_MEDIAN\(DISTINCT\) aggregations are not available
 SELECT approx_median(col_i8), approx_median(distinct col_i8) FROM median_table
 
+# null handling clauses not supported
+query error DataFusion error: Error during planning: \[IGNORE \| RESPECT\] 
NULLS are not permitted for median
+SELECT median(c2) IGNORE NULLS FROM aggregate_test_100
+
+query error DataFusion error: Error during planning: \[IGNORE \| RESPECT\] 
NULLS are not permitted for median
+SELECT median(c2) RESPECT NULLS FROM aggregate_test_100
+
 # median_i16
 query I
 SELECT median(col_i16) FROM median_table
diff --git a/docs/source/library-user-guide/upgrading.md 
b/docs/source/library-user-guide/upgrading.md
index f08e2c383a..7cc84f424b 100644
--- a/docs/source/library-user-guide/upgrading.md
+++ b/docs/source/library-user-guide/upgrading.md
@@ -19,11 +19,34 @@
 
 # Upgrade Guides
 
-## DataFusion `51.0.0`
+## DataFusion `52.0.0`
+
+**Note:** DataFusion `52.0.0` has not been released yet. The information 
provided in this section pertains to features and changes that have already 
been merged to the main branch and are awaiting release in this version.
+
+You can see the current [status of the `52.0.0` release 
here](https://github.com/apache/datafusion/issues/18566)
+
+### `AggregateUDFImpl::supports_null_handling_clause` now defaults to `false`
+
+This method specifies whether an aggregate function allows `IGNORE 
NULLS`/`RESPECT NULLS`
+during SQL parsing, with the implication it respects these configs during 
computation.
+
+Most DataFusion aggregate functions silently ignored this syntax in prior 
versions
+as they did not make use of it and it was permitted by default. We change this 
so
+only the few functions which do respect this clause (e.g. `array_agg`, 
`first_value`,
+`last_value`) need to implement it.
 
-**Note:** DataFusion `51.0.0` has not been released yet. The information 
provided in this section pertains to features and changes that have already 
been merged to the main branch and are awaiting release in this version.
+Custom user defined aggregate functions will also error if this syntax is used,
+unless they explicitly declare support by overriding the method.
 
-You can see the current [status of the `51.0.0`release 
here](https://github.com/apache/datafusion/issues/17558)
+For example, SQL parsing will now fail for queries such as this:
+
+```sql
+SELECT median(c1) IGNORE NULLS FROM table
+```
+
+Instead of silently succeeding.
+
+## DataFusion `51.0.0`
 
 ### `arrow` / `parquet` updated to 57.0.0
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to