mustafasrepo commented on PR #10259:
URL: https://github.com/apache/datafusion/pull/10259#issuecomment-2082318558

   I think having dedicated config setting is more verbose and clear (as in 
`prefer_existing_union`). If we were to  use `prefer_existing_sort` that might 
also work. However, if the condition to replace `UnionExec` to `InterleaveExec` 
is changed to 
   ```rust
   plan.as_any().is::<UnionExec>() 
   && !config.optimizer.prefer_existing_sort 
   && can_interleave(children_plans.iter())
   ```
   this will prefer `UnionExec` instead of `InterleaveExec` even if inputs of 
the `UnionExec` is  unordered when the `config.optimizer.prefer_existing_sort` 
flag is `true`. Which might be counter intuitive given there is no ordering to 
preserve. However, `config.optimizer.prefer_existing_union` does exactly what 
it says. Hence, it is a bit clearer to me. Hence, I think it is better to 
proceed with current approach in this PR.
   
   In the future, if we add support for `OrderPreservingInterleaveExec` (this 
might be accomplished by replacing 
[`CombinedRecordBatchStream`](https://github.com/apache/datafusion/blob/b41ef20c5dad7bdd674e3cc5f35a9c99efae676c/datafusion/physical-plan/src/union.rs#L427)
 with `streaming_merge` in the `fn execute` method of the `InterleaveExec`.)
   using the flag `config.optimizer.prefer_existing_sort` to decide between 
`InterleaveExec` and `OrderPreservingInterleaveExec` might solve the issue. 
This approach may invalidate the requirement for `prefer_existing_union` 
setting. However, until we have this support current approach is much more 
clear. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to