Re: [PR] fix:[9018]Fixed RunArray slice offsets [arrow-rs]

via GitHub Thu, 25 Dec 2025 23:55:25 -0800


Jefffrey commented on code in PR #9036:
URL: https://github.com/apache/arrow-rs/pull/9036#discussion_r2647727740



##########
arrow-select/src/filter.rs:
##########
@@ -495,14 +495,21 @@ where
     let filter_values = predicate.filter.values();
     let run_ends = run_ends.inner();
 
+    let offset = array.offset() as u64;
+    let slice_end = (offset + array.len() as u64).min(offset + 
filter_values.len() as u64);
+
     let pred: BooleanArray = BooleanBuffer::collect_bool(run_ends.len(), |i| {
         let mut keep = false;
-        let mut end = run_ends[i].into() as u64;
-        let difference = end.saturating_sub(filter_values.len() as u64);
-        end -= difference;
+        let end = run_ends[i].into() as u64;
+
+        let loop_start = start.max(offset);
+        let loop_end = end.min(slice_end);

Review Comment:
   This min/max logic doesn't seem intuitive to me; from what I understand it 
is because we still iterate through the unsliced parts of the runarray, but 
this min/max guards us to only check the sliced region. Am I correct in my 
understanding here?
   
   If so I wonder can we refactor this to only consider the sliced region from 
the outset?



##########
arrow-array/src/array/run_array.rs:
##########
@@ -1128,4 +1138,35 @@ mod tests {
 
         assert_eq!(array_i16_1, array_i16_2);
     }
+
+    #[test]
+    fn test_run_array_values_slice() {
+        // A, A, B, B, B, C...C (15 Cs)
+        let run_ends: PrimitiveArray<Int32Type> = vec![2, 5, 20].into();
+        let values: PrimitiveArray<Int32Type> = vec![0, 1, 2].into();

Review Comment:
   It's confusing to have the comments refer to values as A,B,C but the values 
are instead 0,1,2
   
   Maybe it's better to have the values as an actual string array to mirror the 
comments



##########
arrow-buffer/src/buffer/run.rs:
##########
@@ -189,6 +189,16 @@ where
         &self.run_ends
     }
 
+    /// Returns the slice of the run-ends array that corresponds to the
+    /// physical indices of this run-end buffer.
+    ///
+    /// This is equivalent to 
`&self.values()[self.get_start_physical_index()..self.get_end_physical_index() 
+ 1]`

Review Comment:
   ```suggestion
       /// Similar to [`values`] but accounts for logical slicing, returning 
only the values
       /// that are part of the logical slice of this buffer.
       ///
       /// [`values`]: Self::values
   ```
   
   Same here



##########
arrow-select/src/filter.rs:
##########
@@ -495,14 +495,21 @@ where
     let filter_values = predicate.filter.values();
     let run_ends = run_ends.inner();
 
+    let offset = array.offset() as u64;
+    let slice_end = (offset + array.len() as u64).min(offset + 
filter_values.len() as u64);

Review Comment:
   Is this min strictly necessary? I don't think it should be possible for 
`filter_values` length to exceed our RunArray length



##########
arrow-select/src/concat.rs:
##########
@@ -377,20 +381,30 @@ where
         
PrimitiveArray::<R>::from_iter_values(run_arrays.iter().enumerate().flat_map(
             move |(i, run_array)| {
                 let adjustment = needed_run_end_adjustments[i];
+                let offset = 
R::Native::from_usize(run_array.offset()).unwrap();
+                let length = R::Native::from_usize(run_array.len()).unwrap();
+                let start = run_array.get_start_physical_index();
+                let len = run_array.get_end_physical_index() - start + 1;
+
                 run_array
                     .run_ends()
                     .values()
                     .iter()
-                    .map(move |run_end| *run_end + adjustment)
+                    .skip(start)
+                    .take(len)

Review Comment:
   Couldn't we use the new method that was introduced to return only the sliced 
values here?



##########
arrow-array/src/array/run_array.rs:
##########
@@ -136,6 +136,16 @@ impl<R: RunEndIndexType> RunArray<R> {
         &self.values
     }
 
+    /// Returns a slice of the values array that corresponds to the
+    /// physical indices of this run-end encoded array.
+    ///
+    /// This is equivalent to 
`self.values().slice(self.get_start_physical_index(), 
self.get_end_physical_index() - self.get_start_physical_index() + 1)`

Review Comment:
   ```suggestion
       /// Similar to [`values`] but accounts for logical slicing, returning 
only the values
       /// that are part of the logical slice of this array.
       ///
       /// [`values`]: Self::values
   ```
   
   No need to include details on what it is equivalent to; people can simply 
check the source code for that



##########
arrow-buffer/src/buffer/run.rs:
##########
@@ -189,6 +189,16 @@ where
         &self.run_ends
     }
 
+    /// Returns the slice of the run-ends array that corresponds to the
+    /// physical indices of this run-end buffer.
+    ///
+    /// This is equivalent to 
`&self.values()[self.get_start_physical_index()..self.get_end_physical_index() 
+ 1]`
+    pub fn run_ends_slice(&self) -> &[E] {

Review Comment:
   It's confusing to name this `run_ends_slice` considering the other method 
here is named `values`



##########
arrow-select/src/concat.rs:
##########
@@ -377,20 +381,30 @@ where
         
PrimitiveArray::<R>::from_iter_values(run_arrays.iter().enumerate().flat_map(
             move |(i, run_array)| {
                 let adjustment = needed_run_end_adjustments[i];
+                let offset = 
R::Native::from_usize(run_array.offset()).unwrap();
+                let length = R::Native::from_usize(run_array.len()).unwrap();
+                let start = run_array.get_start_physical_index();
+                let len = run_array.get_end_physical_index() - start + 1;
+
                 run_array
                     .run_ends()
                     .values()
                     .iter()
-                    .map(move |run_end| *run_end + adjustment)
+                    .skip(start)
+                    .take(len)
+                    .map(move |run_end| {
+                        let value = *run_end - offset;
+                        value.min(length) + adjustment

Review Comment:
   Why is this min necessary?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix:[9018]Fixed RunArray slice offsets [arrow-rs]

Reply via email to