kosiew opened a new pull request, #22823:
URL: https://github.com/apache/datafusion/pull/22823
## Which issue does this PR close?
Closes #22685
## Rationale for this change
DATE_BIN converts timestamp and TIME values to nanoseconds in several code
paths. Timestamp paths already performed checked scaling, but the logic was
duplicated across scalar and array handling, while TIME paths still used direct
multiplication that could overflow on extreme inputs.
This change centralizes value-to-nanosecond scaling in a single helper and
applies it consistently across timestamp and TIME branches. As a result,
overflow conditions now return a normal error instead of panicking or silently
wrapping, while preserving existing DATE_BIN binning behavior.
## What changes are included in this PR?
* Added a private `value_to_nanos(value, scale)` helper that performs
checked multiplication and returns a DATE_BIN overflow error on failure.
* Renamed the overflow helper from `timestamp_scale_overflow_error` to the
more general `nanos_scale_overflow_error`.
* Replaced duplicated timestamp scaling logic in scalar DATE_BIN paths with
calls to `value_to_nanos`.
* Replaced duplicated timestamp scaling logic in array DATE_BIN paths with
calls to `value_to_nanos`.
* Replaced direct TIME origin scaling for:
* `Time32Millisecond`
* `Time32Second`
* `Time64Microsecond`
with checked conversion through `value_to_nanos`.
* Reworked TIME scalar source handling to propagate scaling overflow errors
through the existing `Result` boundary.
* Updated TIME array source handling to use `value_to_nanos`, converting
scaling failures into `ArrowError::ComputeError` at the `try_unary` boundary.
* Preserved existing DATE_BIN behavior for non-scaling binning failures
(e.g. mapping `stride_fn` failures to `None`/NULL where previously expected).
## Are these changes tested?
Yes.
Added focused regression tests covering overflow during TIME-to-nanosecond
scaling:
* `test_date_bin_time64_microsecond_scalar_scaling_overflow_reproducer`
* `test_date_bin_time64_microsecond_array_scaling_overflow_reproducer`
These tests verify that `DATE_BIN` with `Time64Microsecond(i64::MAX)`
returns a normal error containing:
```
DATE_BIN value 9223372036854775807 cannot be represented in nanoseconds
```
and does not panic.
## Are there any user-facing changes?
Yes. For extreme TIME values that overflow when converted to nanoseconds,
`DATE_BIN` now returns a regular execution/compute error instead of risking a
panic or overflow behavior. Existing DATE_BIN semantics are otherwise
unchanged.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]