alamb commented on PR #5171: URL: https://github.com/apache/arrow-datafusion/pull/5171#issuecomment-1421593187
> Regarding the rule applying ordering, since DataFusion optimization framework is still a traditional heuristic style framework, the rule applying orders always matter, we can not assume one rule can work independently without the others. > Specifically, the `EnforceDistribution` rule is responsible for handling the global distribution requirements. And EnforceSorting rule is responsible for handling the local sort requirements. It's also responsible for removing unnecessary global sort and local sort. The global distribution requirements need to be handled first, after that we can handle the local sort(inner-partition) requirements. Thank you @mingmwang -- I think part of what is confusing here is that two different things are happening as "optimization" passes. 1. "Fixing up the plan for correctness" (aka "EnforceSorting"), which I think is a very similar at a high level to what the [TypeCoercion](https://github.com/apache/arrow-datafusion/blob/master/datafusion/optimizer/src/type_coercion.rs) logical optimizer rule does (coerces types in expressions so they are compatible even if that was not the case in the input plan) 2. "Keep the same semantics of the plan, but rewrite it for better performance" (aka GlobalSortSelection / OptimizeSorts) I think @liukun4515 helped the logical optimizer greatly by identifying this difference, and pulling all the type coercion to the beginning of the optimizer passes ([source link](https://github.com/apache/arrow-datafusion/blob/e222bd627b6e7974133364fed4600d74b4da6811/datafusion/optimizer/src/optimizer.rs#L207)). We probably could have gone farther and made it clear that the TypeCoercion pass is not an optimizer but rather required for correctness. Maybe such clarity in this case could help too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
