jizezhang commented on code in PR #19002:
URL: https://github.com/apache/datafusion/pull/19002#discussion_r2621826156
##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -1531,6 +1542,43 @@ impl PerPartitionStream {
}
}
}
+
+ fn poll_next_and_coalesce(
Review Comment:
Are you referring to maybe this method call
https://github.com/apache/datafusion/blob/fc8824011bf5d4baccbfe51b3888ed5573ef3bfb/datafusion/physical-plan/src/repartition/mod.rs#L542
when pulling batches from input partitions? Do you mean that we could
potentially combine it with coalescing? If yes, it was [discussed
briefly](https://github.com/apache/datafusion/issues/18782#issuecomment-3563395564)
on whether to coalesce batches in input partition stream or output partition
stream. Current implementation coalesces in output stream, as it preserves
existing behavior most. Since batches are sent over channels from input streams
to output streams, I am not sure how we would combine. But I could have totally
misunderstood you or it might actually be better to coalesce when pulling from
input streams given the optimization. Please let me know what you think.
In the sort-preserving case, a `BatchBuilder` is used
https://github.com/apache/datafusion/blob/fc8824011bf5d4baccbfe51b3888ed5573ef3bfb/datafusion/physical-plan/src/sorts/merge.rs#L44
which has methods such as `push_row` and `build_record_batch`
https://github.com/apache/datafusion/blob/fc8824011bf5d4baccbfe51b3888ed5573ef3bfb/datafusion/physical-plan/src/sorts/builder.rs#L112-L125
which internally calls `interleave` from arrow. Would this also be something
to be replaced/improved with the optimization you mentioned, or that is
different?
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]