kosiew opened a new pull request, #22823:
URL: https://github.com/apache/datafusion/pull/22823

   ## Which issue does this PR close?
   
   Closes #22685
   
   ## Rationale for this change
   
   DATE_BIN converts timestamp and TIME values to nanoseconds in several code 
paths. Timestamp paths already performed checked scaling, but the logic was 
duplicated across scalar and array handling, while TIME paths still used direct 
multiplication that could overflow on extreme inputs.
   
   This change centralizes value-to-nanosecond scaling in a single helper and 
applies it consistently across timestamp and TIME branches. As a result, 
overflow conditions now return a normal error instead of panicking or silently 
wrapping, while preserving existing DATE_BIN binning behavior. 
   
   ## What changes are included in this PR?
   
   * Added a private `value_to_nanos(value, scale)` helper that performs 
checked multiplication and returns a DATE_BIN overflow error on failure.
   * Renamed the overflow helper from `timestamp_scale_overflow_error` to the 
more general `nanos_scale_overflow_error`.
   * Replaced duplicated timestamp scaling logic in scalar DATE_BIN paths with 
calls to `value_to_nanos`.
   * Replaced duplicated timestamp scaling logic in array DATE_BIN paths with 
calls to `value_to_nanos`.
   * Replaced direct TIME origin scaling for:
   
     * `Time32Millisecond`
     * `Time32Second`
     * `Time64Microsecond`
       with checked conversion through `value_to_nanos`.
   * Reworked TIME scalar source handling to propagate scaling overflow errors 
through the existing `Result` boundary.
   * Updated TIME array source handling to use `value_to_nanos`, converting 
scaling failures into `ArrowError::ComputeError` at the `try_unary` boundary.
   * Preserved existing DATE_BIN behavior for non-scaling binning failures 
(e.g. mapping `stride_fn` failures to `None`/NULL where previously expected). 
   
   ## Are these changes tested?
   
   Yes.
   
   Added focused regression tests covering overflow during TIME-to-nanosecond 
scaling:
   
   * `test_date_bin_time64_microsecond_scalar_scaling_overflow_reproducer`
   * `test_date_bin_time64_microsecond_array_scaling_overflow_reproducer`
   
   These tests verify that `DATE_BIN` with `Time64Microsecond(i64::MAX)` 
returns a normal error containing:
   
   ```
   DATE_BIN value 9223372036854775807 cannot be represented in nanoseconds
   ```
   
   and does not panic. 
   
   ## Are there any user-facing changes?
   
   Yes. For extreme TIME values that overflow when converted to nanoseconds, 
`DATE_BIN` now returns a regular execution/compute error instead of risking a 
panic or overflow behavior. Existing DATE_BIN semantics are otherwise 
unchanged. 
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to