Re: [PR] fix: handle date_bin negative subsecond and overflow cases [datafusion]

via GitHub Mon, 08 Jun 2026 03:57:46 -0700


kosiew commented on code in PR #22610:
URL: https://github.com/apache/datafusion/pull/22610#discussion_r3372639515



##########
datafusion/functions/src/datetime/date_bin.rs:
##########
@@ -597,91 +564,47 @@ fn date_bin_impl(
         return exec_err!("DATE_BIN stride must be non-zero");
     }
 
-    fn timestamp_scale<T: ArrowTimestampType>() -> i64 {
-        match T::UNIT {
-            Nanosecond => 1,
-            Microsecond => NANOS_PER_MICRO,
-            Millisecond => NANOS_PER_MILLI,
-            Second => NANOSECONDS,
+    fn stride_map_fn<T: ArrowTimestampType>(
+        origin: i64,
+        stride: i64,
+        stride_fn: BinFunction,
+    ) -> impl Fn(i64) -> Result<i64> {
+        let scale = timestamp_scale(T::UNIT);
+        move |x: i64| {
+            Ok(stride_fn(stride, checked_scale_to_nanos(x, scale)?, origin)? / 
scale)
         }
     }
 
-    fn timestamp_scale_overflow_error(x: i64) -> DataFusionError {
-        DataFusionError::Execution(format!(
-            "DATE_BIN source timestamp {x} cannot be represented in 
nanoseconds"
-        ))
-    }
-
     Ok(match array {
         ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(v, tz_opt)) => {

Review Comment:
   Nice cleanup overall. One thought: the four scalar timestamp arms now have 
the same control flow and only differ by the `ScalarValue` variant and Arrow 
timestamp type. It might be worth introducing a small typed helper or macro 
here so the scalar path mirrors `transform_array_with_stride`. That would make 
it easier to keep the scalar and array paths in sync and reduce the chance of 
them drifting apart in a future change.



##########
datafusion/functions/src/datetime/date_bin.rs:
##########
@@ -1433,6 +1367,101 @@ mod tests {
         }
     }
 
+    #[test]
+    fn test_date_bin_scale_overflow_returns_null() {
+        // Scaling non-nanosecond timestamps to nanoseconds can overflow.
+        use arrow::array::{
+            ArrayRef, TimestampMicrosecondArray, TimestampMillisecondArray,
+            TimestampSecondArray,
+        };
+
+        let return_field = &Arc::new(Field::new(

Review Comment:
   Small test suggestion: `test_date_bin_scale_overflow_returns_null` currently 
creates a single `return_field` with `Timestamp(Second)` while also exercising 
millisecond and microsecond source cases. The test passes today because the 
implementation does not use `return_field`, but it might be a little more 
robust to build the field from each case's `expected_type`. That way the test 
stays accurate if `invoke_with_args` ever starts validating or using the return 
field.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: handle date_bin negative subsecond and overflow cases [datafusion]

Reply via email to