comphead commented on code in PR #3858:
URL: https://github.com/apache/datafusion-comet/pull/3858#discussion_r3030485885


##########
native/shuffle/src/partitioners/multi_partition.rs:
##########
@@ -203,6 +203,20 @@ impl MultiPartitionShuffleRepartitioner {
             return Ok(());
         }
 
+        // For zero-column schemas (e.g. COUNT queries), assign all rows to 
partition 0.
+        if input.num_columns() == 0 {
+            let num_rows = input.num_rows();
+            self.metrics.baseline.record_output(num_rows);
+            let batch_idx = self.buffered_batches.len() as u32;
+            self.buffered_batches.push(input);
+            let indices = &mut self.partition_indices[0];
+            indices.reserve(num_rows);

Review Comment:
   There is another potential micro optimization to save some data for indices. 
Instead of storing partition indices for each row:
   ```
   [(1, 0), (1, 0), (1, 0), (1, 0)]
   ```
   
   we can just transfer offset of 4 and on the reader recreate batches 
respectively, but this would bring code complexity and tiny memory usage 
improvements  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to