martin-g commented on code in PR #21586:
URL: https://github.com/apache/datafusion/pull/21586#discussion_r3072801124


##########
datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:
##########
@@ -179,6 +181,134 @@ impl<B: ByteViewType> ByteViewGroupValueBuilder<B> {
         }
     }
 
+    fn vectorized_append_non_null_views(
+        &mut self,
+        array: &GenericByteViewArray<B>,
+        rows: &[usize],
+    ) {
+        let source_views = array.views();
+        self.views.reserve(rows.len());
+
+        if array.data_buffers().is_empty() {
+            self.views.extend(rows.iter().map(|&row| source_views[row]));
+            return;
+        }
+
+        let start_idx = self.views.len();
+        self.views.extend(rows.iter().map(|&row| source_views[row]));

Review Comment:
   This could be moved at line 202 and this will re-use the iteration over 
`rows`.



##########
datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:
##########
@@ -179,6 +181,134 @@ impl<B: ByteViewType> ByteViewGroupValueBuilder<B> {
         }
     }
 
+    fn vectorized_append_non_null_views(
+        &mut self,
+        array: &GenericByteViewArray<B>,
+        rows: &[usize],
+    ) {
+        let source_views = array.views();
+        self.views.reserve(rows.len());
+
+        if array.data_buffers().is_empty() {
+            self.views.extend(rows.iter().map(|&row| source_views[row]));
+            return;
+        }
+
+        let start_idx = self.views.len();
+        self.views.extend(rows.iter().map(|&row| source_views[row]));
+
+        let mut pending = Vec::with_capacity(rows.len());

Review Comment:
   This most probably will over-allocate because only the rows with more than 
12 bytes will be appended. Maybe use half of the rows length as initial 
capacity and let it grow if needed ?!



##########
datafusion/physical-plan/src/aggregates/group_values/multi_group_by/bytes_view.rs:
##########
@@ -179,6 +181,134 @@ impl<B: ByteViewType> ByteViewGroupValueBuilder<B> {
         }
     }
 
+    fn vectorized_append_non_null_views(
+        &mut self,
+        array: &GenericByteViewArray<B>,
+        rows: &[usize],
+    ) {
+        let source_views = array.views();
+        self.views.reserve(rows.len());
+
+        if array.data_buffers().is_empty() {
+            self.views.extend(rows.iter().map(|&row| source_views[row]));
+            return;
+        }
+
+        let start_idx = self.views.len();
+        self.views.extend(rows.iter().map(|&row| source_views[row]));
+
+        let mut pending = Vec::with_capacity(rows.len());
+        for (idx, &row) in rows.iter().enumerate() {
+            let view = source_views[row];
+            if (view as u32) > 12 {
+                pending.push(PendingByteViewCopy {
+                    dest_index: start_idx + idx,
+                    source: ByteView::from(view),
+                });
+            }
+        }
+
+        self.batch_copy_long_views(array.data_buffers(), &pending);
+    }
+
+    fn vectorized_append_views_with_nulls(
+        &mut self,
+        array: &GenericByteViewArray<B>,
+        rows: &[usize],
+    ) {
+        let source_views = array.views();
+        let mut pending = Vec::with_capacity(rows.len());

Review Comment:
   Same - consider using a smaller initial capacity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to