alamb opened a new issue, #9298: URL: https://github.com/apache/arrow-rs/issues/9298
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** - related to https://github.com/apache/arrow-rs/issues/9061 While we work on micro micro optimizations, we have seen a common pattern where older parts of the arrow-rs codebase use `ArrayData` to create new arrays. An ArrayData has at least one extra allocation (for the Vec that holds `Buffer`s) as well as a bunch of dynamic function calls. While this overhead is small individually, it is paid for every array so in aggregate it can be substantial It also typically requires an `unsafe` call which is unnecessary as the new APIs can be checked by the compiler. Quoting @tustvold > My 2 cents is it would be better to move the codepaths relying on ArrayData over to using the typed arrays directly, this should not only cut down on allocations but unnecessary validation and dispatch overheads. **Describe the solution you'd like** Change relying on ArrayData over to creating the typed arrays directly, this should not only cut down on allocations but unnecessary validation and dispatch overheads. **Describe alternatives you've considered** Here are some example PRs - https://github.com/apache/arrow-rs/pull/9122 - https://github.com/apache/arrow-rs/pull/9120 the old, less efficient pattern looks like this (note the `vec![buffer]` to create a buffer). ```rust let data = unsafe { ArrayData::new_unchecked(T::DATA_TYPE, len, None, Some(null), 0, vec![buffer], vec![]) }; PrimitiveArray::from(data) ``` or ```rust let array_data = ArrayDataBuilder::new(arrow_data_type) .len(self.record_reader.num_values()) .add_buffer(record_data) .null_bit_buffer(self.record_reader.consume_bitmap_buffer()); let array_data = unsafe { array_data.build_unchecked() }; ``` The new pattern looks like this (note no unsafe or allocations) ```rust // Create nulls directly (note the `filter` to avoid nulls) let nulls = Some(NullBuffer::new(BooleanBuffer::new(null, 0, len))).filter(|n| n.null_count() > 0); // Create Primitive Array directly PrimitiveArray::new(ScalarBuffer::from(buffer), nulls) ``` ** Note the only tricky thing I have seen is that `ArrayDataBuilder` automatically checks / drops NullBuffers that have no nulls. When updating the code we need to follow a similar pattern **Additional context** - [ ] https://github.com/apache/arrow-rs/issues/9128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
