kosiew commented on code in PR #22610:
URL: https://github.com/apache/datafusion/pull/22610#discussion_r3372639515
##########
datafusion/functions/src/datetime/date_bin.rs:
##########
@@ -597,91 +564,47 @@ fn date_bin_impl(
return exec_err!("DATE_BIN stride must be non-zero");
}
- fn timestamp_scale<T: ArrowTimestampType>() -> i64 {
- match T::UNIT {
- Nanosecond => 1,
- Microsecond => NANOS_PER_MICRO,
- Millisecond => NANOS_PER_MILLI,
- Second => NANOSECONDS,
+ fn stride_map_fn<T: ArrowTimestampType>(
+ origin: i64,
+ stride: i64,
+ stride_fn: BinFunction,
+ ) -> impl Fn(i64) -> Result<i64> {
+ let scale = timestamp_scale(T::UNIT);
+ move |x: i64| {
+ Ok(stride_fn(stride, checked_scale_to_nanos(x, scale)?, origin)? /
scale)
}
}
- fn timestamp_scale_overflow_error(x: i64) -> DataFusionError {
- DataFusionError::Execution(format!(
- "DATE_BIN source timestamp {x} cannot be represented in
nanoseconds"
- ))
- }
-
Ok(match array {
ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(v, tz_opt)) => {
Review Comment:
Nice cleanup overall. One thought: the four scalar timestamp arms now have
the same control flow and only differ by the `ScalarValue` variant and Arrow
timestamp type. It might be worth introducing a small typed helper or macro
here so the scalar path mirrors `transform_array_with_stride`. That would make
it easier to keep the scalar and array paths in sync and reduce the chance of
them drifting apart in a future change.
##########
datafusion/functions/src/datetime/date_bin.rs:
##########
@@ -1433,6 +1367,101 @@ mod tests {
}
}
+ #[test]
+ fn test_date_bin_scale_overflow_returns_null() {
+ // Scaling non-nanosecond timestamps to nanoseconds can overflow.
+ use arrow::array::{
+ ArrayRef, TimestampMicrosecondArray, TimestampMillisecondArray,
+ TimestampSecondArray,
+ };
+
+ let return_field = &Arc::new(Field::new(
Review Comment:
Small test suggestion: `test_date_bin_scale_overflow_returns_null` currently
creates a single `return_field` with `Timestamp(Second)` while also exercising
millisecond and microsecond source cases. The test passes today because the
implementation does not use `return_field`, but it might be a little more
robust to build the field from each case's `expected_type`. That way the test
stays accurate if `invoke_with_args` ever starts validating or using the return
field.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]