mbutrovich commented on code in PR #3858:
URL: https://github.com/apache/datafusion-comet/pull/3858#discussion_r3029613996
##########
native/shuffle/src/partitioners/multi_partition.rs:
##########
@@ -203,6 +203,36 @@ impl MultiPartitionShuffleRepartitioner {
return Ok(());
}
+ // For zero-column schemas (e.g. COUNT queries), assign all rows to
partition 0.
+ // No hashing or expression evaluation needed — just route through
normal buffering.
+ if input.num_columns() == 0 {
+ let num_rows = input.num_rows();
+ self.metrics.baseline.record_output(num_rows);
+ // All rows go to partition 0: partition_starts = [0, num_rows,
num_rows, ...]
+ // partition_row_indices = [0, 1, 2, ..., num_rows-1]
+ let mut scratch = std::mem::take(&mut self.scratch);
Review Comment:
This still looks way more complicated than what I would expect. Why do we
need scratch space and to write `num_rows` `partition_row_indices`. Why are we
"partitioning" rows that don't exist?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]