alamb commented on code in PR #7614:
URL: https://github.com/apache/arrow-rs/pull/7614#discussion_r2132665183


##########
arrow-array/src/builder/generic_bytes_view_builder.rs:
##########
@@ -201,10 +201,40 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> {
         let b = b.get_unchecked(start..end);
 
         let view = make_view(b, block, offset);
-        self.views_builder.append(view);
+        self.views_buffer.push(view);
         self.null_buffer_builder.append_non_null();
     }
 
+    /// Appends an array to the builder.
+    /// This will flush any in-progress block and append the data buffers
+    /// and add the (adapted) views.
+    pub fn append_array(&mut self, array: &GenericByteViewArray<T>) {
+        self.flush_in_progress();
+        self.completed.extend(array.data_buffers().iter().cloned());
+
+        if self.completed.is_empty() {

Review Comment:
   I couldn't resist and tried the following change locally and saw some 
additional improvements in concat but also some slowdowns with entirely inlined 
views, which I don't understand 🤔 
   
   ```diff
   diff --git a/arrow-array/src/builder/generic_bytes_view_builder.rs 
b/arrow-array/src/builder/generic_bytes_view_builder.rs
   index 1e06cd7698..3936ca06d4 100644
   --- a/arrow-array/src/builder/generic_bytes_view_builder.rs
   +++ b/arrow-array/src/builder/generic_bytes_view_builder.rs
   @@ -121,7 +121,7 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> 
{
        /// growing buffer size exponentially from 8KB up to 2MB. The
        /// first buffer allocated is 8KB, then 16KB, then 32KB, etc up to 2MB.
        ///
   -    /// If this method is used, any new buffers allocated are
   +    /// If this method is used, any new buffers allocated are
        /// exactly this size. This can be useful for advanced users
        /// that want to control the memory usage and buffer count.
        ///
   @@ -210,9 +210,11 @@ impl<T: ByteViewType + ?Sized> 
GenericByteViewBuilder<T> {
        /// and add the (adapted) views.
        pub fn append_array(&mut self, array: &GenericByteViewArray<T>) {
            self.flush_in_progress();
   +        let update_views = !(self.completed.is_empty() || 
array.data_buffers().is_empty());
   +
            self.completed.extend(array.data_buffers().iter().cloned());
   
   -        if self.completed.is_empty() {
   +        if update_views {
                self.views_buffer.extend_from_slice(array.views());
            } else {
                let starting_buffer = self.completed.len() as u32;
   ```
   
   ```
   group                                                          
coalesce_improvements                  coalesce_improvements_fix
   -----                                                          
---------------------                  -------------------------
   concat utf8_view  max_str_len=128 null_density=0               1.53     
80.4±0.42µs        ? ?/sec    1.00     52.5±2.85µs        ? ?/sec
   concat utf8_view  max_str_len=128 null_density=0.2             1.51     
86.4±0.34µs        ? ?/sec    1.00     57.3±2.97µs        ? ?/sec
   concat utf8_view  max_str_len=20 null_density=0                1.58     
80.2±0.36µs        ? ?/sec    1.00     50.6±2.33µs        ? ?/sec
   concat utf8_view  max_str_len=20 null_density=0.2              1.65     
98.9±0.36µs        ? ?/sec    1.00     59.9±2.19µs        ? ?/sec
   concat utf8_view all_inline max_str_len=12 null_density=0      1.00     
40.6±2.82µs        ? ?/sec    1.94     78.7±1.18µs        ? ?/sec
   concat utf8_view all_inline max_str_len=12 null_density=0.2    1.00     
53.9±2.78µs        ? ?/sec    1.95    105.1±0.26µs        ? ?/sec
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to