Jefffrey commented on code in PR #9959:
URL: https://github.com/apache/arrow-rs/pull/9959#discussion_r3285893796


##########
arrow-array/src/array/run_array.rs:
##########
@@ -781,6 +789,37 @@ where
         RunArrayIter::new(self)
     }
 }
+/// An array that can be downcast to a [`RunArray`] of any run end type and 
any value type.
+///
+/// This can be used to efficiently implement kernels for all possible run end
+/// types without needing to create specialized implementations for each key 
type.
+pub trait AnyRunEndArray: Array {
+    /// Returns the run ends of this array as a primitive array.
+    fn run_ends(&self) -> ArrayRef;
+
+    /// Returns the values of this array.
+    fn values(&self) -> &Arc<dyn Array>;
+
+    /// Returns a new run-end encoded array with the given values, preserving 
the
+    /// existing run ends.
+    fn with_values(&self, values: ArrayRef) -> ArrayRef;
+}

Review Comment:
   So if theres a case like this:
   
   ```rust
       #[test]
       fn test123() {
           let run = Int32Array::from(vec![3, 6, 9]);
           let values = Int32Array::from(vec![1, 2, 3]);
           // [1, 1, 1, 2, 2, 2, 3, 3, 3]
           let array = RunArray::try_new(&run, &values).unwrap();
   
           // [2, 2, 2]
           // But still references same underlying arrays as above
           // Essentially:
           // {1, 1, 1, [2, 2, 2], 3, 3, 3}
           let array_sliced = array.slice(3, 3);
   
           let new_values = Int32Array::from(vec![7, 8, 9]);
           // [8, 8, 8]
           let new_array_sliced = 
array_sliced.with_values(Arc::new(new_values));
       }
   ```
   
   When we slice the array we still hold on to the original values array; so 
`array_sliced.values()` would return `[1, 2, 3]` even though `array_sliced` 
only has value of `2` repeating. This isn't too big an issue, as I think at 
most its just a performance thing. For example:
   
   ```rust
   let values = date_part(array.values(), part)?;
   let new_array = array.with_values(values);
   ```
   
   If `array` is sliced, then we could be doing extra work since we're 
calculating `date_part` for the values which aren't in the slice.
   
   But for `run_ends()` method its harder to consider since we don't have a use 
case for it currently; is it a potential footgun that calling `run_ends()` on a 
sliced run array ignores any slicing 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to