This is an automated email from the ASF dual-hosted git repository.
yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 959cc9504835 [SPARK-55702][SQL][FOLLOWUP] Clean up dead error code and
fix flaky window filter test
959cc9504835 is described below
commit 959cc95048354dc47668d05268b388397e282fa1
Author: Wenchen Fan <[email protected]>
AuthorDate: Sat Feb 28 23:01:42 2026 +0800
[SPARK-55702][SQL][FOLLOWUP] Clean up dead error code and fix flaky window
filter test
### What changes were proposed in this pull request?
Follow-up to #54501. Two cleanups:
1. **Remove dead error code**: The
`windowAggregateFunctionWithFilterNotSupportedError` method in
`QueryCompilationErrors.scala` and its `_LEGACY_ERROR_TEMP_1030` error class in
`error-conditions.json` were left behind after #54501 removed their only call
site.
2. **Fix flaky `first_value`/`last_value` test**: The window filter test
used `ORDER BY val_long` with a ROWS frame, but `val_long` has duplicate values
in the test data (e.g., three rows with `val_long=1`), making
`first_value`/`last_value` results non-deterministic. Added `val` and `cate` as
tiebreaker columns and used `NULLS LAST` so the output is both stable and
meaningful (without NULLS LAST, the first matching 'a' row has `val=NULL`,
making `first_a` always NULL).
### Why are the changes needed?
1. Dead code should be cleaned up.
2. Non-deterministic tests can cause spurious failures.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Re-ran `SQLQueryTestSuite` for `window.sql` — all 4 tests pass across all
config dimensions.
### Was this patch authored or co-authored using generative AI tooling?
Yes. cursor
Closes #54557 from cloud-fan/follow.
Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
common/utils/src/main/resources/error/error-conditions.json | 5 -----
.../org/apache/spark/sql/errors/QueryCompilationErrors.scala | 6 ------
.../test/resources/sql-tests/analyzer-results/window.sql.out | 10 +++++-----
sql/core/src/test/resources/sql-tests/inputs/window.sql | 6 +++---
sql/core/src/test/resources/sql-tests/results/window.sql.out | 12 ++++++------
5 files changed, 14 insertions(+), 25 deletions(-)
diff --git a/common/utils/src/main/resources/error/error-conditions.json
b/common/utils/src/main/resources/error/error-conditions.json
index b76e3b5c8d56..6c2a648ec52e 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -8008,11 +8008,6 @@
"count(<targetString>.*) is not allowed. Please use count(*) or expand
the columns manually, e.g. count(col1, col2)."
]
},
- "_LEGACY_ERROR_TEMP_1030" : {
- "message" : [
- "Window aggregate function with filter predicate is not supported yet."
- ]
- },
"_LEGACY_ERROR_TEMP_1031" : {
"message" : [
"It is not allowed to use a window function inside an aggregate
function. Please use the inner window function in a sub-query."
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 8cdd734def4a..edf2dfe545c7 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -836,12 +836,6 @@ private[sql] object QueryCompilationErrors extends
QueryErrorsBase with Compilat
messageParameters = Map("expression" -> expression))
}
- def windowAggregateFunctionWithFilterNotSupportedError(): Throwable = {
- new AnalysisException(
- errorClass = "_LEGACY_ERROR_TEMP_1030",
- messageParameters = Map.empty)
- }
-
def windowFunctionInsideAggregateFunctionNotAllowedError(): Throwable = {
new AnalysisException(
errorClass = "_LEGACY_ERROR_TEMP_1031",
diff --git
a/sql/core/src/test/resources/sql-tests/analyzer-results/window.sql.out
b/sql/core/src/test/resources/sql-tests/analyzer-results/window.sql.out
index 76c0fb1919ce..11240c52e9c8 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/window.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/window.sql.out
@@ -688,17 +688,17 @@ Project [cate#x, sum(val) OVER (PARTITION BY cate ORDER
BY val ASC NULLS FIRST R
-- !query
SELECT val, cate,
-first_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long
+first_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long NULLS LAST,
val NULLS LAST, cate NULLS LAST
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS first_a,
-last_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long
+last_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long NULLS LAST,
val NULLS LAST, cate NULLS LAST
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS last_a
-FROM testData ORDER BY val_long, cate
+FROM testData ORDER BY val_long NULLS LAST, val NULLS LAST, cate NULLS LAST
-- !query analysis
Project [val#x, cate#x, first_a#x, last_a#x]
-+- Sort [val_long#xL ASC NULLS FIRST, cate#x ASC NULLS FIRST], true
++- Sort [val_long#xL ASC NULLS LAST, val#x ASC NULLS LAST, cate#x ASC NULLS
LAST], true
+- Project [val#x, cate#x, first_a#x, last_a#x, val_long#xL]
+- Project [val#x, cate#x, _w0#x, val_long#xL, first_a#x, last_a#x,
first_a#x, last_a#x]
- +- Window [first_value(val#x, false) FILTER (WHERE _w0#x)
windowspecdefinition(val_long#xL ASC NULLS FIRST,
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS
first_a#x, last_value(val#x, false) FILTER (WHERE _w0#x)
windowspecdefinition(val_long#xL ASC NULLS FIRST,
specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS
last_a#x], [val_long#xL ASC NULLS FIRST]
+ +- Window [first_value(val#x, false) FILTER (WHERE _w0#x)
windowspecdefinition(val_long#xL ASC NULLS LAST, val#x ASC NULLS LAST, cate#x
ASC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(),
currentrow$())) AS first_a#x, last_value(val#x, false) FILTER (WHERE _w0#x)
windowspecdefinition(val_long#xL ASC NULLS LAST, val#x ASC NULLS LAST, cate#x
ASC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(),
currentrow$())) AS last_a#x], [val_long#xL ASC NULLS [...]
+- Project [val#x, cate#x, (cate#x = a) AS _w0#x, val_long#xL]
+- SubqueryAlias testdata
+- View (`testData`, [val#x, val_long#xL, val_double#x,
val_date#x, val_timestamp#x, cate#x])
diff --git a/sql/core/src/test/resources/sql-tests/inputs/window.sql
b/sql/core/src/test/resources/sql-tests/inputs/window.sql
index 586fe88ac305..3a453e1c80e7 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/window.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/window.sql
@@ -184,11 +184,11 @@ WINDOW w AS (PARTITION BY cate ORDER BY val);
-- window aggregate with filter predicate: first_value/last_value (imperative
aggregate)
SELECT val, cate,
-first_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long
+first_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long NULLS LAST,
val NULLS LAST, cate NULLS LAST
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS first_a,
-last_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long
+last_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long NULLS LAST,
val NULLS LAST, cate NULLS LAST
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS last_a
-FROM testData ORDER BY val_long, cate;
+FROM testData ORDER BY val_long NULLS LAST, val NULLS LAST, cate NULLS LAST;
-- window aggregate with filter predicate: multiple aggregates with different
filters
SELECT val, cate,
diff --git a/sql/core/src/test/resources/sql-tests/results/window.sql.out
b/sql/core/src/test/resources/sql-tests/results/window.sql.out
index 44c3b175868d..3ee7673df641 100644
--- a/sql/core/src/test/resources/sql-tests/results/window.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/window.sql.out
@@ -669,23 +669,23 @@ b 6
-- !query
SELECT val, cate,
-first_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long
+first_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long NULLS LAST,
val NULLS LAST, cate NULLS LAST
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS first_a,
-last_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long
+last_value(val) FILTER (WHERE cate = 'a') OVER(ORDER BY val_long NULLS LAST,
val NULLS LAST, cate NULLS LAST
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS last_a
-FROM testData ORDER BY val_long, cate
+FROM testData ORDER BY val_long NULLS LAST, val NULLS LAST, cate NULLS LAST
-- !query schema
struct<val:int,cate:string,first_a:int,last_a:int>
-- !query output
-NULL NULL 1 NULL
-1 b 1 NULL
+1 a 1 1
3 NULL 1 1
NULL a 1 NULL
1 a 1 1
-1 a 1 1
2 b 1 1
2 a 1 2
3 b 1 2
+1 b 1 2
+NULL NULL 1 2
-- !query
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]