Rich-T-kid opened a new issue, #10017:
URL: https://github.com/apache/arrow-rs/issues/10017

   ## `RunArray::slice()` should trim the underlying values array
   
   `RunArray::slice()` currently tracks slicing logically without adjusting the 
underlying values array, so a sliced `RunArray` still references the full 
original values. This has performance implications for patterns that operate on 
`array.values()`.
   
   ### Example
   
   ```rust
   let run = Int32Array::from(vec![3, 6, 9]);
   let values = Int32Array::from(vec![1, 2, 3]);
   // [1, 1, 1, 2, 2, 2, 3, 3, 3]
   let array = RunArray::try_new(&run, &values).unwrap();
   
   // Logically [2, 2, 2], but still references the original arrays:
   // {1, 1, 1, [2, 2, 2], 3, 3, 3}
   let array_sliced = array.slice(3, 3);
   ```
   
   Ideally `array_sliced` would be `{ run_ends: [3], values: [2] }`, but the 
[current implementation of 
`RunArray::slice()`](https://github.com/apache/arrow-rs/blob/8acab7b5371470deff6c211899295d2bb3030dfc/arrow-array/src/array/run_array.rs#L344)
 preserves the full values array.
   
   ### Why it matters
   
   Patterns like the following do extra work on values outside the slice:
   
   ```rust
   let values = date_part(array.values(), part)?;
   let new_array = array.with_values(values);
   ```
   
   Not a correctness bug, but worth avoiding.
   
   ### Regression checks
   
   Each call site of `RunArray::slice()` needs to be reviewed to ensure this 
change doesn't introduce breaking behavior.
   
   ### Reference
   
   [[Original comment 
thread](https://github.com/apache/arrow-rs/pull/9959#discussion_r3270678608)](https://github.com/apache/arrow-rs/pull/9959#discussion_r3270678608)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to