alamb commented on PR #5171:
URL: 
https://github.com/apache/arrow-datafusion/pull/5171#issuecomment-1421593187

   > Regarding the rule applying ordering, since DataFusion optimization 
framework is still a traditional heuristic style framework, the rule applying 
orders always matter, we can not assume one rule can work independently without 
the others.
   
   > Specifically, the `EnforceDistribution` rule is responsible for handling 
the global distribution requirements.
   And EnforceSorting rule is responsible for handling the local sort 
requirements. It's also responsible for removing
   unnecessary global sort and local sort. The global distribution requirements 
need to be handled first, after that we can handle the local 
sort(inner-partition) requirements.
   
   
   Thank you @mingmwang -- I think part of what is confusing here is that two 
different things are happening as "optimization" passes.
   
   1. "Fixing up the plan for correctness" (aka "EnforceSorting"), which I 
think is a very similar at a high level to what the 
[TypeCoercion](https://github.com/apache/arrow-datafusion/blob/master/datafusion/optimizer/src/type_coercion.rs)
 logical optimizer rule does (coerces types in expressions so they are 
compatible even if that was not the case in the input plan)
   2. "Keep the same semantics of the plan, but rewrite it for better 
performance" (aka GlobalSortSelection / OptimizeSorts)
   
   I think @liukun4515  helped the logical optimizer greatly by identifying 
this difference, and pulling all the type coercion to the beginning of the 
optimizer passes ([source 
link](https://github.com/apache/arrow-datafusion/blob/e222bd627b6e7974133364fed4600d74b4da6811/datafusion/optimizer/src/optimizer.rs#L207)).
 We probably could have gone farther and made it clear that the TypeCoercion 
pass is not an optimizer but rather required for correctness. 
   
   Maybe such clarity in this case could help too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to