alamb opened a new issue, #9298:
URL: https://github.com/apache/arrow-rs/issues/9298

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   - related to https://github.com/apache/arrow-rs/issues/9061
   
   While we work on micro micro optimizations, we have seen a common pattern 
where older parts of the arrow-rs codebase use `ArrayData` to create new 
arrays. 
   
   An ArrayData has at least one extra allocation (for the Vec that holds 
`Buffer`s) as well as a bunch of dynamic function calls. While this overhead is 
small individually, it is paid for every array so in aggregate it can be 
substantial
   
   It also typically requires an `unsafe` call which is unnecessary as the new 
APIs can be checked by the compiler.
   
   Quoting @tustvold 
   
   > My 2 cents is it would be better to move the codepaths relying on 
ArrayData over to using the typed arrays directly, this should not only cut 
down on allocations but unnecessary validation and dispatch overheads.
   
   
   **Describe the solution you'd like**
   Change relying on ArrayData over to creating the typed arrays directly, this 
should not only cut down on allocations but unnecessary validation and dispatch 
overheads.
   
   
   
   **Describe alternatives you've considered**
   Here are some example PRs
   - https://github.com/apache/arrow-rs/pull/9122
   - https://github.com/apache/arrow-rs/pull/9120
   
   the old, less efficient pattern looks like this (note the `vec![buffer]` to 
create a buffer). 
   
   ```rust
           let data = unsafe {
               ArrayData::new_unchecked(T::DATA_TYPE, len, None, Some(null), 0, 
vec![buffer], vec![])
           };
           PrimitiveArray::from(data)
   ```
   
   or
   
   ```rust
           let array_data = ArrayDataBuilder::new(arrow_data_type)
               .len(self.record_reader.num_values())
               .add_buffer(record_data)
               .null_bit_buffer(self.record_reader.consume_bitmap_buffer());
   
           let array_data = unsafe { array_data.build_unchecked() };
   ```
   
   The new pattern looks like this (note no unsafe or allocations)
   
   ```rust
           // Create nulls directly (note the `filter` to avoid nulls)
           let nulls =
               Some(NullBuffer::new(BooleanBuffer::new(null, 0, 
len))).filter(|n| n.null_count() > 0);
           // Create Primitive Array directly
           PrimitiveArray::new(ScalarBuffer::from(buffer), nulls)
   ```
   
   ** Note the only tricky thing I have seen is that `ArrayDataBuilder` 
automatically checks / drops NullBuffers that have no nulls. When updating the 
code we need to follow a similar pattern 
   
   **Additional context**
   - [ ] https://github.com/apache/arrow-rs/issues/9128
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to