etseidl opened a new issue, #6243:
URL: https://github.com/apache/arrow-rs/issues/6243

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Related to #6219 
   
    While investigating #6219, I found several locations where transforming an 
array of type `T` to type `U` uses the pattern
   ```rust
   let array_u = array_t
       .iter()
       .map(|o| o.map(|v| v as U))
       .collect::<PrimitiveArray<U>>()
   ```
   The initial `iter()` call emits `Option<T>` (with nulls becoming `None`), 
which is then mapped to `Option<U>`, with the iter then passed to `from_iter()` 
https://github.com/apache/arrow-rs/blob/a693f0f9c37567b2b121e261fc0a4587776d5ca4/arrow-array/src/array/primitive_array.rs#L1318
 which collects the `Option<U>`s, unwrapping them and collecting into a 
`Buffer`, and rebuilding the original null buffer one value at a time. 
   
   **Describe the solution you'd like**
   We can avoid one `map` and the null buffer creation by instead mapping 
`Option<T>` to `U`, and cloning the original null buffer. Adding a 
`from_iter_values_with_nulls()` function to `PrimitiveArray` allows the above 
to become
   ```rust
   let iter = array_t
       .iter()
       .map(|o| match o {
           Some(v) => v as U,
           None => U::default(),
       });
   PrimitiveArray<U>::from_iter_values_with_nulls(iter, 
array_t.nulls().cloned())
   ```
   This results in a pretty dramatic performance increase.
   
   **Describe alternatives you've considered**
   An alternative is to modify the `Array` trait to add some sort of transform 
function that could consume the null buffer from the original array rather than 
cloning it. But given the use cases I know of are Parquet related, I don't know 
if that's the right approach.
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to