alamb commented on code in PR #7614:
URL: https://github.com/apache/arrow-rs/pull/7614#discussion_r2132374323


##########
arrow-array/src/builder/generic_bytes_view_builder.rs:
##########
@@ -201,10 +201,40 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> {
         let b = b.get_unchecked(start..end);
 
         let view = make_view(b, block, offset);
-        self.views_builder.append(view);
+        self.views_buffer.push(view);
         self.null_buffer_builder.append_non_null();
     }
 
+    /// Appends an array to the builder.
+    /// This will flush any in-progress block and append the data buffers
+    /// and add the (adapted) views.
+    pub fn append_array(&mut self, array: &GenericByteViewArray<T>) {
+        self.flush_in_progress();
+        self.completed.extend(array.data_buffers().iter().cloned());
+
+        if self.completed.is_empty() {

Review Comment:
   This checks `completed.is_empty` *after* pushing new data buffers, which I 
think means the fast path will never be taken. I think the check could be done 
prior to calling `self.completed.extend` and improve performance



##########
arrow-select/src/concat.rs:
##########
@@ -84,6 +86,22 @@ fn fixed_size_list_capacity(arrays: &[&dyn Array], 
data_type: &DataType) -> Capa
     }
 }
 
+fn concat_byte_view(arrays: &[&dyn Array]) -> Result<ArrayRef, ArrowError> {

Review Comment:
   Very minor is that you could make this generic (ByteViewType) rather than 
explicitly have two functions. 



##########
arrow-array/src/builder/generic_bytes_view_builder.rs:
##########
@@ -201,10 +201,40 @@ impl<T: ByteViewType + ?Sized> GenericByteViewBuilder<T> {
         let b = b.get_unchecked(start..end);
 
         let view = make_view(b, block, offset);
-        self.views_builder.append(view);
+        self.views_buffer.push(view);
         self.null_buffer_builder.append_non_null();
     }
 
+    /// Appends an array to the builder.
+    /// This will flush any in-progress block and append the data buffers
+    /// and add the (adapted) views.
+    pub fn append_array(&mut self, array: &GenericByteViewArray<T>) {
+        self.flush_in_progress();
+        self.completed.extend(array.data_buffers().iter().cloned());
+
+        if self.completed.is_empty() {
+            self.views_buffer.extend_from_slice(array.views());
+        } else {
+            let starting_buffer = self.completed.len() as u32;
+
+            self.views_buffer.extend(array.views().iter().map(|v| {
+                let mut byte_view = ByteView::from(*v);
+                if byte_view.length > 12 {
+                    // If the view is small enough, we can inline it

Review Comment:
   ```suggestion
                       // Small views (<=12 bytes) are inlined, so only need to 
update large views
   ```



##########
arrow-array/src/builder/generic_bytes_view_builder.rs:
##########
@@ -79,7 +79,7 @@ impl BlockSizeGrowthStrategy {
 /// using [`GenericByteViewBuilder::append_block`] and then views into this 
block appended
 /// using [`GenericByteViewBuilder::try_append_view`]
 pub struct GenericByteViewBuilder<T: ByteViewType + ?Sized> {
-    views_builder: BufferBuilder<u128>,
+    views_buffer: Vec<u128>,

Review Comment:
   this is a great idea 💯 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to