bvolpato opened a new issue, #22661:
URL: https://github.com/apache/datafusion/issues/22661

   ### Describe the bug
   
   Substrait logical-plan round-trip silently changes window-function semantics.
   
   `Expression.WindowFunction` production currently drops `null_treatment`, 
`distinct`, and `filter`. Consumption reconstructs them as `None`, `false`, and 
`None`. In addition, any window-frame offset that cannot be represented as an 
integer Substrait offset is emitted as `UNBOUNDED`.
   
   This produces valid plans with incorrect results rather than returning an 
unsupported-feature error.
   
   Affected code:
   
   - `datafusion/substrait/src/logical_plan/producer/expr/window_function.rs`
   - `datafusion/substrait/src/logical_plan/consumer/expr/window_function.rs`
   
   ### To Reproduce
   
   Reproduced on current `main` at `496f2c2065c0a7b03745d1e2a47007f7bbda0b39`.
   
   ```bash
   cargo test -p datafusion-sqllogictest --test sqllogictests --features 
substrait -- \
     --substrait-round-trip array_agg_sliding_window.slt:150 window.slt:909
   ```
   
   `array_agg(val) IGNORE NULLS OVER (...)` returns nulls after round-trip:
   
   ```text
   expected: [A], [A], [C], [C], [E]
   actual:   [A], [A, NULL], [NULL, C], [C, NULL], [NULL, E]
   ```
   
   Finite interval `RANGE` frames become unbounded:
   
   ```text
   expected: ... 6 1 1 ... 2 1 1 ... 1 1 1
   actual:   ... 8 8 1 ... 8 8 6 ... 8 8 8
   ```
   
   `window.slt:6030` exercises window `FILTER`. The producer destructures 
`filter: _`, and the consumer always returns `filter: None`, so that predicate 
is also discarded during conversion.
   
   `DISTINCT` has a Substrait representation (`WindowFunction.invocation`) but 
is currently produced as unspecified and consumed as `false`.
   
   ### Expected behavior
   
   Substrait conversion must not return a plan with different query semantics.
   
   - Encode and decode window `DISTINCT` via `AggregationInvocation`.
   - Reject window `FILTER` and `IGNORE NULLS` while no faithful Substrait 
representation is available.
   - Reject finite window-frame offsets that cannot be represented rather than 
converting them to `UNBOUNDED`.
   
   ### Additional context
   
   The Substrait window-function message has `invocation` and integer offset 
bounds, but no field for DataFusion window `FILTER` or null treatment:
   
   https://substrait.io/expressions/window_functions/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to