etseidl opened a new issue, #6243:
URL: https://github.com/apache/arrow-rs/issues/6243
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
Related to #6219
While investigating #6219, I found several locations where transforming an
array of type `T` to type `U` uses the pattern
```rust
let array_u = array_t
.iter()
.map(|o| o.map(|v| v as U))
.collect::<PrimitiveArray<U>>()
```
The initial `iter()` call emits `Option<T>` (with nulls becoming `None`),
which is then mapped to `Option<U>`, with the iter then passed to `from_iter()`
https://github.com/apache/arrow-rs/blob/a693f0f9c37567b2b121e261fc0a4587776d5ca4/arrow-array/src/array/primitive_array.rs#L1318
which collects the `Option<U>`s, unwrapping them and collecting into a
`Buffer`, and rebuilding the original null buffer one value at a time.
**Describe the solution you'd like**
We can avoid one `map` and the null buffer creation by instead mapping
`Option<T>` to `U`, and cloning the original null buffer. Adding a
`from_iter_values_with_nulls()` function to `PrimitiveArray` allows the above
to become
```rust
let iter = array_t
.iter()
.map(|o| match o {
Some(v) => v as U,
None => U::default(),
});
PrimitiveArray<U>::from_iter_values_with_nulls(iter,
array_t.nulls().cloned())
```
This results in a pretty dramatic performance increase.
**Describe alternatives you've considered**
An alternative is to modify the `Array` trait to add some sort of transform
function that could consume the null buffer from the original array rather than
cloning it. But given the use cases I know of are Parquet related, I don't know
if that's the right approach.
**Additional context**
<!--
Add any other context or screenshots about the feature request here.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]