kosiew opened a new issue, #22685:
URL: https://github.com/apache/datafusion/issues/22685

   ## Summary
   DATE_BIN currently performs timestamp unit scaling in multiple call sites 
before invoking the binning function. This logic is duplicated across scalar 
and array paths and primarily uses direct multiplication. The existing approach 
increases maintenance cost and risks inconsistent overflow/error behavior 
across code paths.
   
   
   ## Problem Statement
   The DATE_BIN implementation contains repeated patterns that convert input 
values to nanoseconds using expressions like value * scale in multiple closures 
and match arms.
   
   Examples include:
   - Timestamp scalar mapping logic in 
[datafusion/functions/src/datetime/date_bin.rs](datafusion/functions/src/datetime/date_bin.rs#L593)
   - Timestamp array mapping logic in 
[datafusion/functions/src/datetime/date_bin.rs](datafusion/functions/src/datetime/date_bin.rs#L719)
   - Time scalar paths in 
[datafusion/functions/src/datetime/date_bin.rs](datafusion/functions/src/datetime/date_bin.rs#L648)
   - Time array paths in 
[datafusion/functions/src/datetime/date_bin.rs](datafusion/functions/src/datetime/date_bin.rs#L758)
   
   Because scaling is spread across many locations:
   - Overflow handling is not centrally enforced by one helper.
   - Error mapping behavior can diverge between scalar and array code paths.
   - Future fixes must be replicated in several places.
   
   ## Why This Matters
   - Correctness: unit scaling is arithmetic on i64 values and can overflow for 
edge inputs.
   - Consistency: DATE_BIN should produce predictable behavior regardless of 
scalar vs array path.
   - Maintainability: duplicated conversion logic increases risk of regressions.
   
   ## Proposed Change
   Introduce a small helper local to DATE_BIN implementation to centralize 
checked scaling:
   
   - Helper shape: value_to_nanos(value: i64, scale: i64) -> Result<i64>
   - Internals: use checked_mul and return a structured execution/compute error 
on overflow.
   - Adoption: replace direct value * scale call sites in DATE_BIN scalar and 
array paths.
   - Error behavior: keep existing external behavior, but standardize internal 
overflow handling and message shape.
   
   Optional follow-on helper for reverse conversion can be considered if it 
reduces duplication further, but initial scope should focus on input scaling.
   
   ## Scope
   In scope:
   - Refactor in 
[datafusion/functions/src/datetime/date_bin.rs](datafusion/functions/src/datetime/date_bin.rs)
   - Update/add unit tests in the same module test section as needed
   
   Out of scope:
   - Broad cross-function refactors outside DATE_BIN
   - Behavioral changes unrelated to scaling overflow handling
   
   ## Acceptance Criteria
   1. All DATE_BIN input scaling to nanoseconds uses the new checked helper.
   2. No direct value * scale remains in DATE_BIN timestamp/time scalar and 
array scaling paths where overflow is possible.
   3. Scalar and array paths produce consistent overflow handling semantics.
   4. Existing DATE_BIN tests continue to pass.
   5. New or updated tests cover at least one overflow edge for scaled input 
conversion.
   
   ## Test Plan
   - Run targeted function tests for date_bin module.
   - Add/adjust unit tests in 
[datafusion/functions/src/datetime/date_bin.rs](datafusion/functions/src/datetime/date_bin.rs)
 to validate helper-based overflow handling.
   - If SQL-visible behavior is affected by error-to-NULL mapping, add/extend 
SQL logic tests in 
[datafusion/sqllogictest/test_files/datetime/timestamps.slt](datafusion/sqllogictest/test_files/datetime/timestamps.slt).
   
   ## Related PR
   #22408 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to