Jefffrey commented on code in PR #22315:
URL: https://github.com/apache/datafusion/pull/22315#discussion_r3327912717


##########
datafusion/functions/src/datetime/date_bin.rs:
##########
@@ -594,53 +597,97 @@ fn date_bin_impl(
         return exec_err!("DATE_BIN stride must be non-zero");
     }
 
-    fn stride_map_fn<T: ArrowTimestampType>(
-        origin: i64,
-        stride: i64,
-        stride_fn: BinFunction,
-    ) -> impl Fn(i64) -> Result<i64> {
-        let scale = match T::UNIT {
+    fn timestamp_scale<T: ArrowTimestampType>() -> i64 {
+        match T::UNIT {
             Nanosecond => 1,
             Microsecond => NANOS_PER_MICRO,
             Millisecond => NANOS_PER_MILLI,
             Second => NANOSECONDS,
-        };
-        move |x: i64| match stride_fn(stride, x * scale, origin) {
-            Ok(result) => Ok(result / scale),
-            Err(e) => Err(e),
         }
     }
 
+    fn timestamp_scale_overflow_message(x: i64) -> String {

Review Comment:
   since we're not doing the mapping b/w datafusion & arrow errors anymore 
(since we're not using try_unary now) we might as well have this function just 
return the datafusion error for simplicity, instead of only the message
   
   (my previous comment was alluding to the fact that we might be better off 
constructing an arrow & datafusion version of the errors instead of doing only 
datafusion error and mapping to back to arrow (to then map it to datafusion 
again) but since we're not doing this anymore, we can just have this be 
datafusion error only)



##########
datafusion/functions/src/datetime/date_bin.rs:
##########
@@ -710,20 +765,26 @@ fn date_bin_impl(
                 T: ArrowTimestampType,
             {
                 let array = as_primitive_array::<T>(array)?;
-                let scale = match T::UNIT {
-                    Nanosecond => 1,
-                    Microsecond => NANOS_PER_MICRO,
-                    Millisecond => NANOS_PER_MILLI,
-                    Second => NANOSECONDS,
-                };
-
-                let result: PrimitiveArray<T> = array.try_unary(|val| {
-                    stride_fn(stride, val * scale, origin)
-                        .map(|binned| binned / scale)
-                        .map_err(|e| {
-                            
arrow::error::ArrowError::ComputeError(e.to_string())
-                        })
-                })?;
+                let scale = timestamp_scale::<T>();
+
+                let values = array
+                    .iter()
+                    .map(|val| match val {
+                        Some(val) => {
+                            let scaled = val.checked_mul(scale).ok_or_else(|| {
+                                DataFusionError::Execution(
+                                    timestamp_scale_overflow_message(val),
+                                )
+                            })?;
+                            Ok(stride_fn(stride, scaled, origin)
+                                .ok()

Review Comment:
   cc @kumarUjjawal I think this will overlap with your PR
   
   - https://github.com/apache/datafusion/pull/22610



##########
datafusion/functions/src/datetime/date_bin.rs:
##########
@@ -699,6 +746,14 @@ fn date_bin_impl(
             ColumnarValue::Scalar(ScalarValue::Time64Microsecond(result))
         }
         ColumnarValue::Array(array) => {
+            fn df_to_arrow(e: DataFusionError) -> ArrowError {

Review Comment:
   I think it's preferably to remove this new `df_to_arrow` function since 
we're only using one arm of it (mapping to compute error)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to