This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 54ff4ea05135 [SPARK-55322][SQL][TESTS][FOLLOWUP] Fix `max_by and
min_by with k` failure when ANSI mode is disabled
54ff4ea05135 is described below
commit 54ff4ea05135e2366f271b5aaeaac45b4eba6481
Author: yangjie01 <[email protected]>
AuthorDate: Wed Feb 25 09:19:10 2026 -0800
[SPARK-55322][SQL][TESTS][FOLLOWUP] Fix `max_by and min_by with k` failure
when ANSI mode is disabled
### What changes were proposed in this pull request?
This pr updates a test case in `DataFrameAggregateSuite` regarding `max_by`
and `min_by` functions. Specifically, it refines the assertion logic for
invalid `k` input (non-numeric string) to account for different behaviors
depending on `spark.sql.ansi.enabled`.
- **ANSI Enabled**: Expects `CAST_INVALID_INPUT` or "cannot be cast" error,
as the string `'two'` cannot be cast to an integer.
- **ANSI Disabled**: Expects `VALUE_OUT_OF_RANGE` error. In legacy mode,
the invalid cast returns `0` (default for integer), which then triggers a
validation error because `k` must be positive.
### Why are the changes needed?
Restore daily testing in non-ANSI mode
- https://github.com/apache/spark/actions/runs/22247813526/job/64365502163
```
[info] - max_by and min_by with k *** FAILED *** (1 second, 431
milliseconds)
[info] "[DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE] Cannot resolve "max_by(x,
y, two)" due to data type mismatch: The `k` must be between [1, 100000]
(current value = 0). SQLSTATE: 42K09; line 1 pos 7;
[info] 'Aggregate [unresolvedalias(max_by(x#628078, y#628079, cast(two as
int), false, 0, 0))]
[info] +- SubqueryAlias tab
[info] +- LocalRelation [x#628078, y#628079]
[info] " did not contain "CAST_INVALID_INPUT", and
"[DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE] Cannot resolve "max_by(x, y, two)" due
to data type mismatch: The `k` must be between [1, 100000] (current value = 0).
SQLSTATE: 42K09; line 1 pos 7;
[info] 'Aggregate [unresolvedalias(max_by(x#628078, y#628079, cast(two as
int), false, 0, 0))]
[info] +- SubqueryAlias tab
[info] +- LocalRelation [x#628078, y#628079]
[info] " did not contain "cannot be cast"
(DataFrameAggregateSuite.scala:1386)
...
[info] *** 4 TESTS FAILED ***
[error] Failed: Total 4096, Failed 4, Errors 0, Passed 4092, Ignored 13
[error] Failed tests:
[error] org.apache.spark.sql.SingleLevelAggregateHashMapSuite
[error] org.apache.spark.sql.DataFrameAggregateSuite
[error] org.apache.spark.sql.TwoLevelAggregateHashMapSuite
[error]
org.apache.spark.sql.TwoLevelAggregateHashMapWithVectorizedMapSuite
[error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manually verify by running the command `SPARK_ANSI_SQL_MODE=false build/sbt
"sql/testOnly org.apache.spark.sql.SingleLevelAggregateHashMapSuite
org.apache.spark.sql.DataFrameAggregateSuite
org.apache.spark.sql.TwoLevelAggregateHashMapSuite
org.apache.spark.sql.TwoLevelAggregateHashMapWithVectorizedMapSuite"`, and all
tests pass successfully.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #54484 from LuciferYang/SPARK-55322-FOLLOWUP.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
index 64b33ccb89a2..f606c6746f3c 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
@@ -1383,8 +1383,12 @@ class DataFrameAggregateSuite extends QueryTest
val error = intercept[Exception] {
sql(s"SELECT $fn(x, y, 'two') FROM VALUES (('a', 10)) AS tab(x,
y)").collect()
}
- assert(error.getMessage.contains("CAST_INVALID_INPUT") ||
- error.getMessage.contains("cannot be cast"))
+ if (conf.ansiEnabled) {
+ assert(error.getMessage.contains("CAST_INVALID_INPUT") ||
+ error.getMessage.contains("cannot be cast"))
+ } else {
+ assert(error.getMessage.contains("VALUE_OUT_OF_RANGE"))
+ }
}
// Error: k must be positive
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]