mustafasrepo commented on code in PR #7488:
URL: https://github.com/apache/arrow-datafusion/pull/7488#discussion_r1319488427
##########
datafusion/core/src/physical_plan/repartition/mod.rs:
##########
@@ -637,7 +637,8 @@ impl RepartitionExec {
/// Set Order preserving flag
pub fn with_preserve_order(mut self, preserve_order: bool) -> Self {
- self.preserve_order = preserve_order;
+ // Set "preserve order" mode only if the operator cannot maintain the
order:
+ self.preserve_order = preserve_order &&
!self.maintains_input_order()[0];
Review Comment:
The reason for this change is that, `RepartitionExec` and
`SortPreservingRepartitionExec` has different approaches.
`SortPreservingRepartitionExec` adds additional overhead compared to
`RepartitionExec`. For this reason wo not want to use
`SortPreservingRepartitionExec` unless it is necessary. (`preserve_order` flag
determines working mode, if `true` mode is `SortPreservingRepartitionExec` is
used otherwise `RepartitionExec` is used. However, when input partition number
is 1, `RepartitionExec` can also maintain ordering. Hence for this case, we do
not need to use `SortPreservingRepartitionExec`). I agree that your proposed
version is more readable. I changed the implementation according to your
suggestion
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]