alamb commented on pull request #378:
URL: https://github.com/apache/arrow-datafusion/pull/378#issuecomment-847217916


   > I think you are right about that, we should not rely on the optimizer to 
make the execution plan correct. I think it would be better if the planner adds 
the MergeExecs for the appropriate nodes.
   
   I agree with @tustvold  and @Dandandan on this -- I think the plan should 
generate correct results without requiring optimizer passes being run. The 
optimizer passes should just (potentially) make the plans faster. 
   
    > I therefore think the addition of a preserve_partitioning flag to 
SortExec is necessary and has precedent.
   
   I agree
   
   > Currently Repartition may insert RepartitionExec between an operator and 
its children, provided that operator doesn't require a single partition. It is 
then reliant on a later optimisation pass with AddMergeExec to join together 
the partitions if a operator further up the tree requires it.
   
   Is there any reason we can't call `AddMergeExec` multiple times? Once (and 
always) as part of creating the physical plans and then potentially again as 
part of `Repartition`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to