geoffreyclaude opened a new pull request, #17378:
URL: https://github.com/apache/datafusion/pull/17378
## Which issue does this PR close?
- Closes #674.
## Rationale for this change
This PR adds SQL-standard FILTER support for window aggregates (aligned with
SQLite section 2.3 semantics), enabling concise conditional aggregation within
window frames without rewriting with CASE. CASE cannot fully replicate FILTER
because FILTER removes rows before the aggregate sees them, while CASE only
nullifies values:
- ARRAY_AGG example: FILTER excludes rows; CASE injects NULLs into the array.
```sql
ARRAY_AGG(c2) FILTER (WHERE c2 >= 2) OVER (...) -- e.g. [2, 3, 4]
ARRAY_AGG(CASE WHEN c2 >= 2 THEN c2 END) OVER (...) -- e.g. [NULL,
NULL, 2, 3, 4]
```
- COUNT example: FILTER keeps the aggregate and intent clear; CASE forces a
different aggregate.
```sql
COUNT(*) FILTER (WHERE pred) OVER (...) -- counts gated rows
-- vs
SUM(CASE WHEN pred THEN 1 ELSE 0 END) OVER (...) -- different
function, different typing/overflow behavior
```
FILTER also composes correctly with DISTINCT and null-treatments because it
gates rows before those steps; CASE introduces NULLs into the input domain,
altering semantics.
## What changes are included in this PR?
- Adds an optional FILTER predicate to aggregate window functions and
threads it from SQL parsing through logical planning into physical execution.
- Enforces that FILTER is only permitted on aggregate window functions;
using it on non-aggregate window functions returns a clear planning error.
- At execution, evaluates the FILTER once per input batch to a boolean mask
and applies it to aggregate inputs before update/retract steps, working for
both bounded and sliding windows and composing with DISTINCT and
NULL-treatments.
- Ensures EXPLAIN and SQL unparsing display the FILTER clause to make intent
and plans clear.
## Are these changes tested?
Yes—coverage includes parsing/planning/execution correctness, negative cases
for invalid usage, and plan readability via EXPLAIN.
## Are there any user-facing changes?
Yes—users can now write FILTER on aggregate window functions. Example:
```sql
SELECT
c1,
SUM(c2) FILTER (WHERE c2 >= 2)
OVER (ORDER BY c1, c2
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS sum_ge_2
FROM t
ORDER BY c1, c2;
```
Using FILTER with non-aggregate window functions (e.g., ROW_NUMBER, LAG)
yields a planning error. This is additive and does not change existing behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]