bvolpato opened a new pull request, #22664:
URL: https://github.com/apache/datafusion/pull/22664
## Which issue does this PR close?
- Closes #22661.
## Rationale for this change
Substrait logical-plan conversion can currently return a valid window plan
with
different semantics from its input. On current `main` (`496f2c2065c0`), a
round trip of:
```sql
SELECT array_agg(val) IGNORE NULLS OVER (
ORDER BY ts ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
)
```
changes expected output `[A], [A], [C], [C], [E]` to
`[A], [A, NULL], [NULL, C], [C, NULL], [NULL, E]`.
Likewise, `window.slt:909` uses finite interval `RANGE` bounds; round trip
changes bounded counts such as `6, 2, 1` into partition-wide counts `8, 8,
8`.
Returning changed results is worse than reporting unsupported conversion.
## What changes are included in this PR?
- Encode and decode window `DISTINCT` through Substrait
`AggregationInvocation`.
- Reject explicit window null treatment and `FILTER`, which the current
Substrait window expression path does not encode.
- Reject finite frame offsets which cannot be represented as Substrait
integer
offsets instead of widening them to `UNBOUNDED`.
- Add logical round-trip coverage for `DISTINCT` and fail-closed coverage for
`IGNORE NULLS`, `RESPECT NULLS`, `FILTER`, and interval `RANGE` frames.
## Are these changes tested?
- `cargo fmt --all`
- `cargo clippy --all-targets --all-features -- -D warnings`
- `cargo test -p datafusion-substrait`
Pre-fix reproduction:
```bash
cargo test -p datafusion-sqllogictest --test sqllogictests --features
substrait -- --substrait-round-trip array_agg_sliding_window.slt:150
window.slt:909
```
- `array_agg_sliding_window.slt:150` produced null-containing arrays despite
`IGNORE NULLS`.
- `window.slt:909` produced partition-wide counts for finite interval bounds.
Post-fix targeted checks:
```bash
cargo test -p datafusion-sqllogictest --test sqllogictests --features
substrait -- --substrait-round-trip array_agg_sliding_window.slt:150
cargo test -p datafusion-sqllogictest --test sqllogictests --features
substrait -- --substrait-round-trip window.slt:909
```
- The first now fails conversion with `Window functions with IGNORE NULLS are
not supported in Substrait`.
- The second now fails conversion with `Unsupported Substrait window frame
offset: 1 DAY`.
- Selecting `window.slt:909` also runs unrelated existing `EXPLAIN` records
in
that fixture, which report separate plan-output mismatches after the target
conversion error.
## Are there any user-facing changes?
Substrait producers now return explicit unsupported-feature errors for window
null treatment, window `FILTER`, and finite non-integer window-frame offsets
instead of producing plans with altered semantics. Window `DISTINCT` now
round-trips correctly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]