Re: [PR] chore: fix native shuffle for batches with no columns and 0 row count [datafusion-comet]

via GitHub Thu, 02 Apr 2026 14:35:11 -0700


comphead commented on code in PR #3858:
URL: https://github.com/apache/datafusion-comet/pull/3858#discussion_r3030485885



##########
native/shuffle/src/partitioners/multi_partition.rs:
##########
@@ -203,6 +203,20 @@ impl MultiPartitionShuffleRepartitioner {
             return Ok(());
         }
 
+        // For zero-column schemas (e.g. COUNT queries), assign all rows to 
partition 0.
+        if input.num_columns() == 0 {
+            let num_rows = input.num_rows();
+            self.metrics.baseline.record_output(num_rows);
+            let batch_idx = self.buffered_batches.len() as u32;
+            self.buffered_batches.push(input);
+            let indices = &mut self.partition_indices[0];
+            indices.reserve(num_rows);

Review Comment:
   There is another potential micro optimization to save some data for indices. 
Instead of storing partition indices for each row:
   ```
   [(1, 0), (1, 0), (1, 0), (1, 0)]
   ```
   
   we can just transfer offset of 4 and on the reader recreate batches 
respectively, but this would bring code complexity and tiny memory usage 
improvements  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] chore: fix native shuffle for batches with no columns and 0 row count [datafusion-comet]

Reply via email to