alamb commented on code in PR #7614:
URL: https://github.com/apache/arrow-rs/pull/7614#discussion_r2132665183
##########
arrow-array/src/builder/generic_bytes_view_builder.rs:
##########
@@ -201,10 +201,40 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> {
let b = b.get_unchecked(start..end);
let view = make_view(b, block, offset);
- self.views_builder.append(view);
+ self.views_buffer.push(view);
self.null_buffer_builder.append_non_null();
}
+ /// Appends an array to the builder.
+ /// This will flush any in-progress block and append the data buffers
+ /// and add the (adapted) views.
+ pub fn append_array(&mut self, array: &GenericByteViewArray<T>) {
+ self.flush_in_progress();
+ self.completed.extend(array.data_buffers().iter().cloned());
+
+ if self.completed.is_empty() {
Review Comment:
I couldn't resist and tried the following change locally and saw some
additional improvements in concat but also some slowdowns with entirely inlined
views, which I don't understand 🤔
```diff
diff --git a/arrow-array/src/builder/generic_bytes_view_builder.rs
b/arrow-array/src/builder/generic_bytes_view_builder.rs
index 1e06cd7698..3936ca06d4 100644
--- a/arrow-array/src/builder/generic_bytes_view_builder.rs
+++ b/arrow-array/src/builder/generic_bytes_view_builder.rs
@@ -121,7 +121,7 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T>
{
/// growing buffer size exponentially from 8KB up to 2MB. The
/// first buffer allocated is 8KB, then 16KB, then 32KB, etc up to 2MB.
///
- /// If this method is used, any new buffers allocated are
+ /// If this method is used, any new buffers allocated are
/// exactly this size. This can be useful for advanced users
/// that want to control the memory usage and buffer count.
///
@@ -210,9 +210,11 @@ impl<T: ByteViewType + ?Sized>
GenericByteViewBuilder<T> {
/// and add the (adapted) views.
pub fn append_array(&mut self, array: &GenericByteViewArray<T>) {
self.flush_in_progress();
+ let update_views = !(self.completed.is_empty() ||
array.data_buffers().is_empty());
+
self.completed.extend(array.data_buffers().iter().cloned());
- if self.completed.is_empty() {
+ if update_views {
self.views_buffer.extend_from_slice(array.views());
} else {
let starting_buffer = self.completed.len() as u32;
```
```
group
coalesce_improvements coalesce_improvements_fix
-----
--------------------- -------------------------
concat utf8_view max_str_len=128 null_density=0 1.53
80.4±0.42µs ? ?/sec 1.00 52.5±2.85µs ? ?/sec
concat utf8_view max_str_len=128 null_density=0.2 1.51
86.4±0.34µs ? ?/sec 1.00 57.3±2.97µs ? ?/sec
concat utf8_view max_str_len=20 null_density=0 1.58
80.2±0.36µs ? ?/sec 1.00 50.6±2.33µs ? ?/sec
concat utf8_view max_str_len=20 null_density=0.2 1.65
98.9±0.36µs ? ?/sec 1.00 59.9±2.19µs ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0 1.00
40.6±2.82µs ? ?/sec 1.94 78.7±1.18µs ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2 1.00
53.9±2.78µs ? ?/sec 1.95 105.1±0.26µs ? ?/sec
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]