comphead opened a new pull request, #22140:
URL: https://github.com/apache/datafusion/pull/22140
## Which issue does this PR close?
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases. You can
link an issue to this PR using the GitHub syntax. For example `Closes #123`
indicates that this PR will close issue #123.
-->
- Closes #22137 .
## Rationale for this change
`RANGE` window frames with a value offset (e.g. `RANGE BETWEEN 4 PRECEDING
AND 4 FOLLOWING`) panicked with `attempt to add/subtract with overflow`
whenever the boundary target (`value ± delta`) wrapped
past the type's representable range. Affected inputs include values close
to `i64::MAX`/`i64::MIN`, `u64::MAX`, and any analogous boundary for other
integer/decimal/timestamp types.
Two `// TODO: Handle ... overflows.` markers in
`WindowFrameStateRange::calculate_index_of_row` had been left for this case;
the unchecked `ScalarValue::add` / `sub` silently wrapped the target, after
which `search_in_slice` was handed a nonsensical (wrapped) value and
downstream code tripped a debug-assert subtraction in
`functions-window/src/nth_value.rs`.
Semantically, an overflowed boundary is *unbounded* with respect to the
data in the partition — every real value lies strictly inside the wrapped
sentinel — so the correct behavior is to collapse the
search to the appropriate partition edge rather than to search with a
wrapped target.
## What changes are included in this PR?
`datafusion/expr/src/window_state.rs`
- Replace `ScalarValue::add` / `sub` with their `*_checked` counterparts
in the boundary computation.
- On overflow, short-circuit to the correct partition edge: `search_start`
for `PRECEDING`-direction searches, `length` for `FOLLOWING`-direction
searches. The collapse direction depends only on the
const-generic `SEARCH_SIDE` (the add branch and sub branch both reduce to
`!SEARCH_SIDE` once you expand the `SEARCH_SIDE == is_descending` invariant
that selects each arithmetic branch).
- The pre-existing `value.is_unsigned() && value < delta` clamp-to-zero
path for unsigned subtraction is preserved — it produces a valid polymorphic
zero, not an overflow sentinel.
- No behavior change on the non-overflow path.
`datafusion/sqllogictest/test_files/window.slt`
Regression coverage for positive and negative overflow, across:
- `ASC` + `FOLLOWING` / `ASC` + `PRECEDING` / `DESC` + `PRECEDING` /
`DESC` + `FOLLOWING` (each overflow direction occurs on both sort orders
depending on which arithmetic branch is taken)
- Symmetric `N PRECEDING AND N FOLLOWING` frames where only one side
overflows
- Signed (`i64`) and unsigned (`u64`) ordering columns
- `first_value` and `last_value` both exercised to verify both frame edges
- `ROWS` frame regression guard to document that the pre-existing
`saturating_sub` / `min(length)` saturation behavior is unchanged.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]