alamb commented on code in PR #9393:
URL: https://github.com/apache/arrow-rs/pull/9393#discussion_r2795101299
##########
arrow-select/src/interleave.rs:
##########
@@ -155,12 +157,12 @@ fn interleave_primitive<T: ArrowPrimitiveType>(
) -> Result<ArrayRef, ArrowError> {
let interleaved = Interleave::<'_, PrimitiveArray<T>>::new(values,
indices);
- let values = indices
+ let values: ScalarBuffer<T::Native> = indices
.iter()
.map(|(a, b)| interleaved.arrays[*a].value(*b))
- .collect::<Vec<_>>();
+ .collect();
- let array = PrimitiveArray::<T>::try_new(values.into(),
interleaved.nulls)?;
+ let array = PrimitiveArray::<T>::try_new(values, interleaved.nulls)?;
Review Comment:
Basically there are a bunch of places in the code I think that make
ScalarBuffer from `Vec`s and assume that is fast. If we can make it faster,
maybe we can optimize that path (in one place) rather than having to change all
call sites to use collect `ScalarBuffer` directly 🤔
##########
arrow-select/src/interleave.rs:
##########
@@ -155,12 +157,12 @@ fn interleave_primitive<T: ArrowPrimitiveType>(
) -> Result<ArrayRef, ArrowError> {
let interleaved = Interleave::<'_, PrimitiveArray<T>>::new(values,
indices);
- let values = indices
+ let values: ScalarBuffer<T::Native> = indices
.iter()
.map(|(a, b)| interleaved.arrays[*a].value(*b))
- .collect::<Vec<_>>();
+ .collect();
- let array = PrimitiveArray::<T>::try_new(values.into(),
interleaved.nulls)?;
+ let array = PrimitiveArray::<T>::try_new(values, interleaved.nulls)?;
Review Comment:
I don't understand why this would help
The old code did `ScalarBuffer::from`
https://github.com/apache/arrow-rs/blob/f9dd799f7baf9731ee01027c10cf6b64c94a793f/arrow-buffer/src/buffer/scalar.rs#L205-L212
Which calls `Buffer::from_vec`
Which makes a MutableBuffer by taking over the Vec allocation
https://github.com/apache/arrow-rs/blob/08e34ba2710abff8b6119a043ef8d5d0102b2b8e/arrow-buffer/src/buffer/mutable.rs#L813-L830
Which then makes a buffer
So TLDR is I don't understand why this saves a memory copy
Or is the theory that all the shenanigans to make Vec -> MutableBuffer ->
Bytes/Buffer can be reduced?
##########
arrow-buffer/src/buffer/mutable.rs:
##########
@@ -101,14 +101,25 @@ pub struct MutableBuffer {
data: NonNull<u8>,
// invariant: len <= capacity
len: usize,
- layout: Layout,
+ capacity: usize,
Review Comment:
What is the rationale for inlining Layout here?
It seems like Layout
https://doc.rust-lang.org/beta/std/alloc/struct.Layout.html has the same
representation and has accessors to get capacity and alignment 🤔 as usize
https://doc.rust-lang.org/beta/std/alloc/struct.Layout.html#method.size
https://doc.rust-lang.org/beta/std/alloc/struct.Layout.html#method.align
If we reverted this change to Layout I think the diff would be easier to
understand
##########
arrow-buffer/src/buffer/mutable.rs:
##########
@@ -101,14 +101,25 @@ pub struct MutableBuffer {
data: NonNull<u8>,
// invariant: len <= capacity
len: usize,
- layout: Layout,
+ capacity: usize,
+ // cached alignment, Layout reconstructed when needed
+ align: usize,
/// Memory reservation for tracking memory usage
#[cfg(feature = "pool")]
reservation: Mutex<Option<Box<dyn MemoryReservation>>>,
}
impl MutableBuffer {
+ /// Reconstruct Layout from cached capacity and alignment.
+ /// Only used in cold paths (alloc/dealloc/realloc).
+ #[inline]
+ fn layout(&self) -> Layout {
+ debug_assert!(self.align.is_power_of_two());
Review Comment:
See comment above -- I don't understand the rationale for inlining the
`Layout`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]