This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new 8ef884d3c4 Change default of
`AggregateUDFImpl::supports_null_handling_clause` to `false` (#18441)
8ef884d3c4 is described below
commit 8ef884d3c4cbb5e0c961ab307d1d16aea6836f0c
Author: Jeffrey Vo <[email protected]>
AuthorDate: Thu Nov 13 19:34:11 2025 +1100
Change default of `AggregateUDFImpl::supports_null_handling_clause` to
`false` (#18441)
## Which issue does this PR close?
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->
- Part of #9924
## Rationale for this change
<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
Setting a default of `true` is too permissive; we can see it allows
specifying it on `median` for example even though that function doesn't
take the config into account. It seems only `array_agg`, `first_value`
and `last_value` actually respect the config this setting handles so it
makes more sense to make this false by default.
## What changes are included in this PR?
<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
Change default of `AggregateUDFImpl::supports_null_handling_clause` to
`false` (from `true`). Adjust `array_agg`, `first_value` and
`last_value` to implement it as `true`.
## Are these changes tested?
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code
If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Existing tests, also added a negative case for `median` (expect SQL
parsing to fail if providing null handling clause).
## Are there any user-facing changes?
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
Behaviour change as default of a trait method is changing. Added section
to upgrade guide.
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
---
datafusion/expr/src/udaf.rs | 10 +++++---
.../src/approx_percentile_cont.rs | 4 ---
.../src/approx_percentile_cont_with_weight.rs | 4 ---
datafusion/functions-aggregate/src/array_agg.rs | 4 +++
datafusion/functions-aggregate/src/first_last.rs | 8 ++++++
.../functions-aggregate/src/percentile_cont.rs | 4 ---
datafusion/sqllogictest/test_files/aggregate.slt | 7 ++++++
docs/source/library-user-guide/upgrading.md | 29 +++++++++++++++++++---
8 files changed, 52 insertions(+), 18 deletions(-)
diff --git a/datafusion/expr/src/udaf.rs b/datafusion/expr/src/udaf.rs
index 42a5f9b262..e2ae697dee 100644
--- a/datafusion/expr/src/udaf.rs
+++ b/datafusion/expr/src/udaf.rs
@@ -740,10 +740,14 @@ pub trait AggregateUDFImpl: Debug + DynEq + DynHash +
Send + Sync {
ScalarValue::try_from(data_type)
}
- /// If this function supports `[IGNORE NULLS | RESPECT NULLS]` clause,
return true
- /// If the function does not, return false
+ /// If this function supports `[IGNORE NULLS | RESPECT NULLS]` SQL clause,
+ /// return `true`. Otherwise, return `false` which will cause an error to
be
+ /// raised during SQL parsing if these clauses are detected for this
function.
+ ///
+ /// Functions which implement this as `true` are expected to handle the
resulting
+ /// null handling config present in [`AccumulatorArgs`], `ignore_nulls`.
fn supports_null_handling_clause(&self) -> bool {
- true
+ false
}
/// If this function supports the `WITHIN GROUP (ORDER BY column
[ASC|DESC])`
diff --git a/datafusion/functions-aggregate/src/approx_percentile_cont.rs
b/datafusion/functions-aggregate/src/approx_percentile_cont.rs
index 4015abc6ad..6a09d159f6 100644
--- a/datafusion/functions-aggregate/src/approx_percentile_cont.rs
+++ b/datafusion/functions-aggregate/src/approx_percentile_cont.rs
@@ -315,10 +315,6 @@ impl AggregateUDFImpl for ApproxPercentileCont {
Ok(arg_types[0].clone())
}
- fn supports_null_handling_clause(&self) -> bool {
- false
- }
-
fn supports_within_group_clause(&self) -> bool {
true
}
diff --git
a/datafusion/functions-aggregate/src/approx_percentile_cont_with_weight.rs
b/datafusion/functions-aggregate/src/approx_percentile_cont_with_weight.rs
index 51891ce7f2..42aba0b95e 100644
--- a/datafusion/functions-aggregate/src/approx_percentile_cont_with_weight.rs
+++ b/datafusion/functions-aggregate/src/approx_percentile_cont_with_weight.rs
@@ -258,10 +258,6 @@ impl AggregateUDFImpl for ApproxPercentileContWithWeight {
self.approx_percentile_cont.state_fields(args)
}
- fn supports_null_handling_clause(&self) -> bool {
- false
- }
-
fn supports_within_group_clause(&self) -> bool {
true
}
diff --git a/datafusion/functions-aggregate/src/array_agg.rs
b/datafusion/functions-aggregate/src/array_agg.rs
index b830588d40..c4574bf042 100644
--- a/datafusion/functions-aggregate/src/array_agg.rs
+++ b/datafusion/functions-aggregate/src/array_agg.rs
@@ -224,6 +224,10 @@ impl AggregateUDFImpl for ArrayAgg {
datafusion_expr::ReversedUDAF::Reversed(array_agg_udaf())
}
+ fn supports_null_handling_clause(&self) -> bool {
+ true
+ }
+
fn documentation(&self) -> Option<&Documentation> {
self.doc()
}
diff --git a/datafusion/functions-aggregate/src/first_last.rs
b/datafusion/functions-aggregate/src/first_last.rs
index 73f2ec112f..d2233fd932 100644
--- a/datafusion/functions-aggregate/src/first_last.rs
+++ b/datafusion/functions-aggregate/src/first_last.rs
@@ -302,6 +302,10 @@ impl AggregateUDFImpl for FirstValue {
ReversedUDAF::Reversed(last_value_udaf())
}
+ fn supports_null_handling_clause(&self) -> bool {
+ true
+ }
+
fn documentation(&self) -> Option<&Documentation> {
self.doc()
}
@@ -1127,6 +1131,10 @@ impl AggregateUDFImpl for LastValue {
ReversedUDAF::Reversed(first_value_udaf())
}
+ fn supports_null_handling_clause(&self) -> bool {
+ true
+ }
+
fn documentation(&self) -> Option<&Documentation> {
self.doc()
}
diff --git a/datafusion/functions-aggregate/src/percentile_cont.rs
b/datafusion/functions-aggregate/src/percentile_cont.rs
index 1e06461e56..9ee788a6dd 100644
--- a/datafusion/functions-aggregate/src/percentile_cont.rs
+++ b/datafusion/functions-aggregate/src/percentile_cont.rs
@@ -358,10 +358,6 @@ impl AggregateUDFImpl for PercentileCont {
}
}
- fn supports_null_handling_clause(&self) -> bool {
- false
- }
-
fn supports_within_group_clause(&self) -> bool {
true
}
diff --git a/datafusion/sqllogictest/test_files/aggregate.slt
b/datafusion/sqllogictest/test_files/aggregate.slt
index a593f66c7d..132922e42d 100644
--- a/datafusion/sqllogictest/test_files/aggregate.slt
+++ b/datafusion/sqllogictest/test_files/aggregate.slt
@@ -844,6 +844,13 @@ SELECT approx_median(distinct col_i8) FROM median_table
statement error DataFusion error: This feature is not implemented:
APPROX_MEDIAN\(DISTINCT\) aggregations are not available
SELECT approx_median(col_i8), approx_median(distinct col_i8) FROM median_table
+# null handling clauses not supported
+query error DataFusion error: Error during planning: \[IGNORE \| RESPECT\]
NULLS are not permitted for median
+SELECT median(c2) IGNORE NULLS FROM aggregate_test_100
+
+query error DataFusion error: Error during planning: \[IGNORE \| RESPECT\]
NULLS are not permitted for median
+SELECT median(c2) RESPECT NULLS FROM aggregate_test_100
+
# median_i16
query I
SELECT median(col_i16) FROM median_table
diff --git a/docs/source/library-user-guide/upgrading.md
b/docs/source/library-user-guide/upgrading.md
index f08e2c383a..7cc84f424b 100644
--- a/docs/source/library-user-guide/upgrading.md
+++ b/docs/source/library-user-guide/upgrading.md
@@ -19,11 +19,34 @@
# Upgrade Guides
-## DataFusion `51.0.0`
+## DataFusion `52.0.0`
+
+**Note:** DataFusion `52.0.0` has not been released yet. The information
provided in this section pertains to features and changes that have already
been merged to the main branch and are awaiting release in this version.
+
+You can see the current [status of the `52.0.0` release
here](https://github.com/apache/datafusion/issues/18566)
+
+### `AggregateUDFImpl::supports_null_handling_clause` now defaults to `false`
+
+This method specifies whether an aggregate function allows `IGNORE
NULLS`/`RESPECT NULLS`
+during SQL parsing, with the implication it respects these configs during
computation.
+
+Most DataFusion aggregate functions silently ignored this syntax in prior
versions
+as they did not make use of it and it was permitted by default. We change this
so
+only the few functions which do respect this clause (e.g. `array_agg`,
`first_value`,
+`last_value`) need to implement it.
-**Note:** DataFusion `51.0.0` has not been released yet. The information
provided in this section pertains to features and changes that have already
been merged to the main branch and are awaiting release in this version.
+Custom user defined aggregate functions will also error if this syntax is used,
+unless they explicitly declare support by overriding the method.
-You can see the current [status of the `51.0.0`release
here](https://github.com/apache/datafusion/issues/17558)
+For example, SQL parsing will now fail for queries such as this:
+
+```sql
+SELECT median(c1) IGNORE NULLS FROM table
+```
+
+Instead of silently succeeding.
+
+## DataFusion `51.0.0`
### `arrow` / `parquet` updated to 57.0.0
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]