alamb commented on code in PR #10120:
URL: https://github.com/apache/arrow-rs/pull/10120#discussion_r3431451628


##########
arrow-buffer/src/buffer/offset.rs:
##########
@@ -326,6 +326,78 @@ impl<O: ArrowNativeType> OffsetBuffer<O> {
 
         end_offset_of_last_valid_value != last_offset
     }
+
+    /// Subtract `rhs` from all offsets
+    /// This will try to reuse the existing allocation as much as possible
+    ///
+    /// Panics: this will panic if `rhs` > the first offset or if `rhs` will 
lead to overflow (when `rhs` is negative)
+    pub fn subtract(self, rhs: O) -> Self
+    where
+        O: std::ops::Sub<Output = O> + std::cmp::PartialOrd + 
num_traits::CheckedSub,
+    {
+        if rhs == O::usize_as(0) {
+            return self;
+        }
+
+        let len = self.len();
+
+        // Offset buffer is guaranteed to be non-empty
+        assert!(
+            self[0] >= rhs,
+            "shifted offsets will become negative which is not allowed"
+        );
+
+        // If negative, make sure that this will not create an overflow
+        if rhs < O::usize_as(0) {
+            self[len - 1].checked_sub(&rhs).expect("must not overflow");
+        }
+
+        let original_buffer = self.into_inner().into_inner();
+        let original_length = original_buffer.len();
+        let buffer_offset = original_buffer.ptr_offset();
+
+        // Remove this once https://github.com/apache/arrow-rs/issues/10117 is 
resolved.

Review Comment:
   I don't follow the issue described in 
https://github.com/apache/arrow-rs/issues/10117
   
   I also think we can accomplish the same thing using `into_vec` which is less 
code and possibly faster (as Vec is crazy fast):
   
   
   ```rust
       /// Subtract `rhs` from all offsets
       /// This will try to reuse the existing allocation as much as possible
       ///
       /// Panics: this will panic if `rhs` > the first offset or if `rhs` will 
lead to overflow (when `rhs` is negative)
       pub fn subtract(self, rhs: O) -> Self
       where
           O: std::ops::Sub<Output = O> + std::cmp::PartialOrd + 
num_traits::CheckedSub,
       {
           if rhs == O::usize_as(0) {
               return self;
           }
   
           let len = self.len();
   
           // Offset buffer is guaranteed to be non-empty
           assert!(
               self[0] >= rhs,
               "shifted offsets will become negative which is not allowed"
           );
   
           // If negative, make sure that this will not create an overflow
           if rhs < O::usize_as(0) {
               self[len - 1].checked_sub(&rhs).expect("must not overflow");
           }
   
           // try and reuse buffer:
           let shifted_offsets: Vec<O> = match 
self.into_inner().into_inner().into_vec() {
               // If we can reuse the buffer, update in place
               Ok(mut v) => {
                   for offset in v.iter_mut() {
                       *offset = *offset - rhs;
                   }
                   v
               }
               // otherwise, buffer is shared so we need a copy
               Err(buffer) => {
                   let offsets = ScalarBuffer::<O>::from(buffer);
                   offsets.iter().map(|offset| *offset - rhs).collect()
               }
           };
           let shifted_buffer = ScalarBuffer::from(shifted_offsets);
           // Safety: offsets are valid as they are coming from a valid
           // offset buffer and we checked overflow above, and we
           // subtracted the same value from all offsets, thus keeping the
           // same properties as the input buffer
           unsafe { Self::new_unchecked(shifted_buffer) }
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to