alamb commented on pull request #378:
URL: https://github.com/apache/arrow-datafusion/pull/378#issuecomment-847217916
> I think you are right about that, we should not rely on the optimizer to
make the execution plan correct. I think it would be better if the planner adds
the MergeExecs for the appropriate nodes.
I agree with @tustvold and @Dandandan on this -- I think the plan should
generate correct results without requiring optimizer passes being run. The
optimizer passes should just (potentially) make the plans faster.
> I therefore think the addition of a preserve_partitioning flag to
SortExec is necessary and has precedent.
I agree
> Currently Repartition may insert RepartitionExec between an operator and
its children, provided that operator doesn't require a single partition. It is
then reliant on a later optimisation pass with AddMergeExec to join together
the partitions if a operator further up the tree requires it.
Is there any reason we can't call `AddMergeExec` multiple times? Once (and
always) as part of creating the physical plans and then potentially again as
part of `Repartition`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]