[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2643: Add overflow-checking variant for primitive arithmetic kernels and explicitly define overflow behavior

GitBox Sat, 03 Sep 2022 11:26:26 -0700


tustvold commented on code in PR #2643:
URL: https://github.com/apache/arrow-rs/pull/2643#discussion_r962188392



##########
arrow/src/compute/kernels/arithmetic.rs:
##########
@@ -933,15 +1065,34 @@ where
 
 /// Perform `left * right` operation on two arrays. If either left or right 
value is null
 /// then the result is also null.
+///
+/// This doesn't detect overflow. Once overflowing, the result will wrap 
around.
+/// For an overflow-checking variant, use `multiply_check` instead.
 pub fn multiply<T>(
     left: &PrimitiveArray<T>,
     right: &PrimitiveArray<T>,
 ) -> Result<PrimitiveArray<T>>
 where
     T: datatypes::ArrowNumericType,
-    T::Native: Mul<Output = T::Native>,
+    T::Native: ArrowNativeTypeOp,
+{
+    math_op(left, right, |a, b| a.wrapping_mul_if_applied(b))
+}
+
+/// Perform `left * right` operation on two arrays. If either left or right 
value is null
+/// then the result is also null.
+///
+/// This detects overflow and returns an `Err` for that. For an 
non-overflow-checking variant,
+/// use `multiply` instead.
+pub fn multiply_check<T>(

Review Comment:
   ```suggestion
   pub fn multiply_checked<T>(
   ```
   



##########
arrow/src/compute/kernels/arithmetic.rs:
##########
@@ -103,6 +104,99 @@ where
     Ok(PrimitiveArray::<LT>::from(data))
 }
 
+/// This is similar to `math_op` as it performs given operation between two 
input primitive arrays.
+/// But the given operation can return `None` if overflow is detected. For the 
case, this function
+/// returns an `Err`.
+fn math_checked_op<LT, RT, F>(
+    left: &PrimitiveArray<LT>,
+    right: &PrimitiveArray<RT>,
+    op: F,
+) -> Result<PrimitiveArray<LT>>
+where
+    LT: ArrowNumericType,
+    RT: ArrowNumericType,
+    F: Fn(LT::Native, RT::Native) -> Option<LT::Native>,
+{
+    if left.len() != right.len() {
+        return Err(ArrowError::ComputeError(
+            "Cannot perform math operation on arrays of different 
length".to_string(),
+        ));
+    }
+
+    let left_iter = ArrayIter::new(left);
+    let right_iter = ArrayIter::new(right);
+
+    let values: Result<Vec<Option<<LT as ArrowPrimitiveType>::Native>>> = 
left_iter
+        .into_iter()
+        .zip(right_iter.into_iter())
+        .map(|(l, r)| {
+            if let (Some(l), Some(r)) = (l, r) {
+                let result = op(l, r);
+                if let Some(r) = result {
+                    Ok(Some(r))
+                } else {
+                    // Overflow
+                    Err(ArrowError::ComputeError("Overflow 
happened".to_string()))

Review Comment:
   Perhaps we could print the problematic values



##########
arrow/src/compute/kernels/arithmetic.rs:
##########
@@ -760,15 +854,34 @@ where
 
 /// Perform `left + right` operation on two arrays. If either left or right 
value is null
 /// then the result is also null.
+///
+/// This doesn't detect overflow. Once overflowing, the result will wrap 
around.
+/// For an overflow-checking variant, use `add_checked` instead.
 pub fn add<T>(
     left: &PrimitiveArray<T>,
     right: &PrimitiveArray<T>,
 ) -> Result<PrimitiveArray<T>>
 where
     T: ArrowNumericType,
-    T::Native: Add<Output = T::Native>,
+    T::Native: ArrowNativeTypeOp,
+{
+    math_op(left, right, |a, b| a.wrapping_add_if_applied(b))

Review Comment:
   👌



##########
arrow/src/datatypes/native.rs:
##########
@@ -114,6 +115,98 @@ pub trait ArrowPrimitiveType: 'static {
     }
 }
 
+/// Trait for ArrowNativeType to provide overflow-aware operations.
+pub trait ArrowNativeTypeOp:
+    ArrowNativeType
+    + Add<Output = Self>
+    + Sub<Output = Self>

Review Comment:
   I find the default impls a bit confusing here, it took me a bit to realise 
they're only used by the floating point types



##########
arrow/src/datatypes/native.rs:
##########
@@ -114,6 +115,98 @@ pub trait ArrowPrimitiveType: 'static {
     }
 }
 
+/// Trait for ArrowNativeType to provide overflow-aware operations.
+pub trait ArrowNativeTypeOp:
+    ArrowNativeType
+    + Add<Output = Self>
+    + Sub<Output = Self>
+    + Mul<Output = Self>
+    + Div<Output = Self>
+{
+    fn checked_add_if_applied(self, rhs: Self) -> Option<Self> {
+        Some(self + rhs)
+    }
+
+    fn wrapping_add_if_applied(self, rhs: Self) -> Self {
+        self + rhs
+    }
+
+    fn checked_sub_if_applied(self, rhs: Self) -> Option<Self> {
+        Some(self - rhs)
+    }
+
+    fn wrapping_sub_if_applied(self, rhs: Self) -> Self {
+        self + rhs
+    }
+
+    fn checked_mul_if_applied(self, rhs: Self) -> Option<Self> {
+        Some(self * rhs)
+    }
+
+    fn wrapping_mul_if_applied(self, rhs: Self) -> Self {
+        self * rhs
+    }
+
+    fn checked_div_if_applied(self, rhs: Self) -> Option<Self> {
+        Some(self / rhs)
+    }
+
+    fn wrapping_div_if_applied(self, rhs: Self) -> Self {
+        self / rhs
+    }
+}
+
+macro_rules! native_type_op {
+    ($t:tt) => {
+        impl ArrowNativeTypeOp for $t {
+            fn checked_add_if_applied(self, rhs: Self) -> Option<Self> {

Review Comment:
   Why the if_applied suffix? Is it because wrapping operations are not 
applicable to floating point types? Perhaps we could call them add and 
add_checked to match the kernels?
   



##########
arrow/src/datatypes/native.rs:
##########
@@ -114,6 +115,98 @@ pub trait ArrowPrimitiveType: 'static {
     }
 }
 
+/// Trait for ArrowNativeType to provide overflow-aware operations.
+pub trait ArrowNativeTypeOp:

Review Comment:
   Does this need to be public?



##########
arrow/src/compute/kernels/arithmetic.rs:
##########
@@ -103,6 +104,99 @@ where
     Ok(PrimitiveArray::<LT>::from(data))
 }
 
+/// This is similar to `math_op` as it performs given operation between two 
input primitive arrays.
+/// But the given operation can return `None` if overflow is detected. For the 
case, this function
+/// returns an `Err`.
+fn math_checked_op<LT, RT, F>(
+    left: &PrimitiveArray<LT>,
+    right: &PrimitiveArray<RT>,
+    op: F,
+) -> Result<PrimitiveArray<LT>>
+where
+    LT: ArrowNumericType,
+    RT: ArrowNumericType,
+    F: Fn(LT::Native, RT::Native) -> Option<LT::Native>,
+{
+    if left.len() != right.len() {
+        return Err(ArrowError::ComputeError(
+            "Cannot perform math operation on arrays of different 
length".to_string(),
+        ));
+    }
+
+    let left_iter = ArrayIter::new(left);
+    let right_iter = ArrayIter::new(right);
+
+    let values: Result<Vec<Option<<LT as ArrowPrimitiveType>::Native>>> = 
left_iter
+        .into_iter()
+        .zip(right_iter.into_iter())
+        .map(|(l, r)| {
+            if let (Some(l), Some(r)) = (l, r) {
+                let result = op(l, r);
+                if let Some(r) = result {
+                    Ok(Some(r))
+                } else {
+                    // Overflow
+                    Err(ArrowError::ComputeError("Overflow 
happened".to_string()))
+                }
+            } else {
+                Ok(None)
+            }
+        })
+        .collect();
+
+    let values = values?;
+
+    Ok(PrimitiveArray::<LT>::from_iter(values))
+}
+
+/// This is similar to `math_checked_op` but just for divide op.
+fn math_checked_divide<LT, RT, F>(
+    left: &PrimitiveArray<LT>,
+    right: &PrimitiveArray<RT>,
+    op: F,
+) -> Result<PrimitiveArray<LT>>
+where
+    LT: ArrowNumericType,
+    RT: ArrowNumericType,
+    RT::Native: One + Zero,
+    F: Fn(LT::Native, RT::Native) -> Option<LT::Native>,
+{
+    if left.len() != right.len() {
+        return Err(ArrowError::ComputeError(
+            "Cannot perform math operation on arrays of different 
length".to_string(),
+        ));
+    }
+
+    let left_iter = ArrayIter::new(left);
+    let right_iter = ArrayIter::new(right);
+
+    let values: Result<Vec<Option<<LT as ArrowPrimitiveType>::Native>>> = 
left_iter
+        .into_iter()
+        .zip(right_iter.into_iter())
+        .map(|(l, r)| {
+            if let (Some(l), Some(r)) = (l, r) {
+                let result = op(l, r);
+                if let Some(r) = result {
+                    Ok(Some(r))
+                } else {
+                    if r.is_zero() {
+                        
Err(ArrowError::ComputeError("DivideByZero".to_string()))
+                    } else {
+                        // Overflow
+                        Err(ArrowError::ComputeError("Overflow 
happened".to_string()))

Review Comment:
   Same here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2643: Add overflow-checking variant for primitive arithmetic kernels and explicitly define overflow behavior

Reply via email to