Rich-T-kid opened a new issue, #10017:
URL: https://github.com/apache/arrow-rs/issues/10017
## `RunArray::slice()` should trim the underlying values array
`RunArray::slice()` currently tracks slicing logically without adjusting the
underlying values array, so a sliced `RunArray` still references the full
original values. This has performance implications for patterns that operate on
`array.values()`.
### Example
```rust
let run = Int32Array::from(vec![3, 6, 9]);
let values = Int32Array::from(vec![1, 2, 3]);
// [1, 1, 1, 2, 2, 2, 3, 3, 3]
let array = RunArray::try_new(&run, &values).unwrap();
// Logically [2, 2, 2], but still references the original arrays:
// {1, 1, 1, [2, 2, 2], 3, 3, 3}
let array_sliced = array.slice(3, 3);
```
Ideally `array_sliced` would be `{ run_ends: [3], values: [2] }`, but the
[current implementation of
`RunArray::slice()`](https://github.com/apache/arrow-rs/blob/8acab7b5371470deff6c211899295d2bb3030dfc/arrow-array/src/array/run_array.rs#L344)
preserves the full values array.
### Why it matters
Patterns like the following do extra work on values outside the slice:
```rust
let values = date_part(array.values(), part)?;
let new_array = array.with_values(values);
```
Not a correctness bug, but worth avoiding.
### Regression checks
Each call site of `RunArray::slice()` needs to be reviewed to ensure this
change doesn't introduce breaking behavior.
### Reference
[[Original comment
thread](https://github.com/apache/arrow-rs/pull/9959#discussion_r3270678608)](https://github.com/apache/arrow-rs/pull/9959#discussion_r3270678608)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]