kosiew commented on code in PR #22408:
URL: https://github.com/apache/datafusion/pull/22408#discussion_r3287769157
##########
datafusion/functions/src/datetime/date_bin.rs:
##########
@@ -328,20 +328,35 @@ fn date_bin_nanos_interval(stride_nanos: i64, source:
i64, origin: i64) -> Resul
})?;
// distance from origin to bin
- let time_delta = compute_distance(time_diff, stride_nanos);
+ let time_delta = compute_distance(time_diff, stride_nanos)?;
- Ok(origin + time_delta)
+ origin.checked_add(time_delta).ok_or_else(|| {
+ arrow::error::ArrowError::InvalidArgumentError(format!(
+ "date_bin origin {origin} + delta {time_delta} overflows i64"
+ ))
+ .into()
+ })
}
// distance from origin to bin
-fn compute_distance(time_diff: i64, stride: i64) -> i64 {
- let time_delta = time_diff - (time_diff % stride);
+fn compute_distance(time_diff: i64, stride: i64) -> Result<i64> {
+ let remainder = time_diff % stride;
Review Comment:
Nice catch making this path fallible overall. I think there’s still one
unchecked overflow case left here though.
`time_diff % stride` can still panic when `time_diff == i64::MIN` and
`stride == -1`. Right now `date_bin_impl` only rejects `stride == 0`, so a
negative interval can still reach this code path and panic instead of returning
the normal error/NULL behavior this fix is aiming for.
Could we switch this to `checked_rem` here, or alternatively reject
non-positive strides before calling `compute_distance`?
It would also be great to add a regression test covering the `i64::MIN, -1`
case.
##########
datafusion/functions/src/datetime/date_bin.rs:
##########
@@ -1341,4 +1356,39 @@ mod tests {
assert!(val.is_none(), "Expected None for out of range operation");
}
}
+
+ #[test]
+ fn test_date_bin_compute_distance_i64_min() {
Review Comment:
The unit test coverage here looks good for the UDF invocation path. Since
this is SQL-visible behavior and there are already `date_bin` SLTs in
`datafusion/sqllogictest/test_files/datetime/timestamps.slt`, it might be worth
adding an end-to-end SLT case as well.
That would help verify the problematic scalar input returns `NULL` instead
of panicking through the SQL execution path too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]