Re: [PR] Compact StringView buffer during sparse (<50%) take to avoid holding the original buffers alive [arrow-rs]

via GitHub Wed, 11 Feb 2026 11:32:07 -0800


Dandandan commented on code in PR #9391:
URL: https://github.com/apache/arrow-rs/pull/9391#discussion_r2795090543



##########
arrow-select/src/take.rs:
##########
@@ -608,6 +616,141 @@ fn take_byte_view<T: ByteViewType, IndexType: 
ArrowPrimitiveType>(
     })
 }
 
+/// `take` implementation for byte view arrays that compacts string data into
+/// new buffers rather than sharing the original buffers.
+///
+/// This fuses the gather (take) with string compaction in a single pass,
+/// producing an output array whose buffers contain only the referenced data.
+/// This is beneficial when `take` selects a small fraction of the source 
array,
+/// as it avoids keeping the original large buffers alive.
+///
+/// The output uses multiple buffers if a single buffer would exceed `u32::MAX`
+/// bytes, ensuring `ByteView::offset` never overflows.
+///
+/// # Safety contract
+/// Callers must ensure that all non-null indices are within bounds of
+/// `array` (i.e. `< array.len()`). This is guaranteed when called via
+/// `take()` with `check_bounds` enabled, or when the caller otherwise
+/// validates indices. Out-of-bounds indices will cause a panic (indexing
+/// `src_views`) or UB (via `get_unchecked` on `src_buffers`).
+#[inline(never)]
+fn take_byte_view_compact<T: ByteViewType, IndexType: ArrowPrimitiveType>(

Review Comment:
   Can we add a benchmark case to cover this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Compact StringView buffer during sparse (<50%) take to avoid holding the original buffers alive [arrow-rs]

Reply via email to