bvolpato opened a new pull request, #22664:
URL: https://github.com/apache/datafusion/pull/22664

   ## Which issue does this PR close?
   
   - Closes #22661.
   
   ## Rationale for this change
   
   Substrait logical-plan conversion can currently return a valid window plan 
with
   different semantics from its input. On current `main` (`496f2c2065c0`), a
   round trip of:
   
   ```sql
   SELECT array_agg(val) IGNORE NULLS OVER (
     ORDER BY ts ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
   )
   ```
   
   changes expected output `[A], [A], [C], [C], [E]` to
   `[A], [A, NULL], [NULL, C], [C, NULL], [NULL, E]`.
   
   Likewise, `window.slt:909` uses finite interval `RANGE` bounds; round trip
   changes bounded counts such as `6, 2, 1` into partition-wide counts `8, 8, 
8`.
   Returning changed results is worse than reporting unsupported conversion.
   
   ## What changes are included in this PR?
   
   - Encode and decode window `DISTINCT` through Substrait
     `AggregationInvocation`.
   - Reject explicit window null treatment and `FILTER`, which the current
     Substrait window expression path does not encode.
   - Reject finite frame offsets which cannot be represented as Substrait 
integer
     offsets instead of widening them to `UNBOUNDED`.
   - Add logical round-trip coverage for `DISTINCT` and fail-closed coverage for
     `IGNORE NULLS`, `RESPECT NULLS`, `FILTER`, and interval `RANGE` frames.
   
   ## Are these changes tested?
   
   - `cargo fmt --all`
   - `cargo clippy --all-targets --all-features -- -D warnings`
   - `cargo test -p datafusion-substrait`
   
   Pre-fix reproduction:
   
   ```bash
   cargo test -p datafusion-sqllogictest --test sqllogictests --features 
substrait -- --substrait-round-trip array_agg_sliding_window.slt:150 
window.slt:909
   ```
   
   - `array_agg_sliding_window.slt:150` produced null-containing arrays despite
     `IGNORE NULLS`.
   - `window.slt:909` produced partition-wide counts for finite interval bounds.
   
   Post-fix targeted checks:
   
   ```bash
   cargo test -p datafusion-sqllogictest --test sqllogictests --features 
substrait -- --substrait-round-trip array_agg_sliding_window.slt:150
   cargo test -p datafusion-sqllogictest --test sqllogictests --features 
substrait -- --substrait-round-trip window.slt:909
   ```
   
   - The first now fails conversion with `Window functions with IGNORE NULLS are
     not supported in Substrait`.
   - The second now fails conversion with `Unsupported Substrait window frame
     offset: 1 DAY`.
   - Selecting `window.slt:909` also runs unrelated existing `EXPLAIN` records 
in
     that fixture, which report separate plan-output mismatches after the target
     conversion error.
   
   ## Are there any user-facing changes?
   
   Substrait producers now return explicit unsupported-feature errors for window
   null treatment, window `FILTER`, and finite non-integer window-frame offsets
   instead of producing plans with altered semantics. Window `DISTINCT` now
   round-trips correctly.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to