ozankabak commented on PR #5171: URL: https://github.com/apache/arrow-datafusion/pull/5171#issuecomment-1420248052
> @mustafasrepo @ozankabak Regarding the rule applying ordering, since DataFusion optimization framework is still a traditional heuristic style framework, the rule applying orders always matter, we can not assume one rule can work independently without the others. > > Specifically , `EnforceDistribution` rule is responsible for handling the global distribution requirements. And `EnforceSorting` rule is responsible for handling the local sort requirements. It's also responsible for removing unnecessary global sort and local sort. The global distribution requirements need to be handled first, after that we can handle the local sort(inner-partition) requirements. > > Global properties vs Local properties http://www.cs.albany.edu/~jhh/courses/readings/zhou10.pdf I agree that fixing partitioning (global) and then sorting (local) is the more intuitive order, but this does not seem strictly necessary to me in theory. I can imagine changing global properties while still preserving the previous local properties for every partition (in the new plan). I think such a behavior would make rules very robust and easy to reason with. The current PR is not really about this anyway, but that's my general line of thinking when we refer to orthogonality. Nevertheless, maybe you are aware of a fundamental issue (that I am not foreseeing right now) which makes this impossible. If that is the case, then we will go with the current status quo, of course. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
