Jefffrey commented on code in PR #9036:
URL: https://github.com/apache/arrow-rs/pull/9036#discussion_r2652097491


##########
arrow-select/src/filter.rs:
##########
@@ -497,7 +498,7 @@ where
 
     let pred: BooleanArray = BooleanBuffer::collect_bool(run_ends.len(), |i| {

Review Comment:
   I'm still thinking about this a bit; it seems odd that we still iterate 
through all of the `run_ends` including the values before/after the slice. I 
wonder if there is a better way to handle it, or maybe I misunderstand how this 
logic is working 🤔 



##########
arrow-select/src/concat.rs:
##########
@@ -377,20 +381,30 @@ where
         
PrimitiveArray::<R>::from_iter_values(run_arrays.iter().enumerate().flat_map(
             move |(i, run_array)| {
                 let adjustment = needed_run_end_adjustments[i];
+                let offset = 
R::Native::from_usize(run_array.offset()).unwrap();
+                let length = R::Native::from_usize(run_array.len()).unwrap();
+                let start = run_array.get_start_physical_index();
+                let len = run_array.get_end_physical_index() - start + 1;
+
                 run_array
                     .run_ends()
                     .values()
                     .iter()
-                    .map(move |run_end| *run_end + adjustment)
+                    .skip(start)
+                    .take(len)
+                    .map(move |run_end| {
+                        let value = *run_end - offset;
+                        value.min(length) + adjustment

Review Comment:
   Could we add a comment to that effect here? Something along the lines of:
   
   > min required for cases where slice ends in the middle of the run at the end



##########
arrow-select/src/concat.rs:
##########
@@ -1879,4 +1886,34 @@ mod tests {
         assert_eq!(values.len(), 6);
         assert_eq!(&[10, 20, 30, 40, 50, 60], values.values());
     }
+
+    #[test]
+    fn test_concat_run_array_with_truncated_run() {
+        // Create a run array with run ends [2, 5] and values [10, 20]
+        // Logical: [10, 10, 20, 20, 20]
+        let run_ends1 = Int32Array::from(vec![2, 5]);
+        let values1 = Int32Array::from(vec![10, 20]);
+        let array1 = RunArray::try_new(&run_ends1, &values1).unwrap();
+
+        // Slice(0, 3) -> [10, 10, 20]
+        // The last physical run for this slice is the second run (ends at 5).
+        // Without min(length), its run end would be 5 - 0 = 5.
+        // With min(length), it becomes min(5 - 0, 3) = 3.

Review Comment:
   These comments are nice, but they are quite separated from the 
implementation so they are a bit confusing (if someone were to look at this 
test, how would they understand what `min(length)` is referring to here?)



##########
arrow-buffer/src/buffer/run.rs:
##########
@@ -189,6 +189,16 @@ where
         &self.run_ends
     }
 
+    /// Similar to [`values`] but accounts for logical slicing, returning only 
the values
+    /// that are part of the logical slice of this buffer.
+    ///
+    /// [`values`]: Self::values
+    pub fn values_slice(&self) -> &[E] {

Review Comment:
   I think for this docstring we also need a clear warning note that the first 
& last run values may be inaccurate as they don't take into account the logical 
slicing. I wonder if there is a better way to handle this without too much 
overhead 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to