ozankabak commented on PR #5290: URL: https://github.com/apache/arrow-datafusion/pull/5290#issuecomment-1432298878
The PR is very large in scope. It changes parts of the old code (and certainly makes some changes to its tests), and also adds new code (and new tests). It would be much easier to review this if it were broken down to two PRs, where the first one only replicates the current functionality, has no functionality regressions, and does not change any tests at all; with the second PR adding new functionality. Right now, the new rule is significantly longer than the old rule (which is bad), but it offers more functionality (which is great). So is switching from bottom-up to top-down a good change or a bad change? We can't tell easily. Now, let me share my (very) preliminary impression so far after a cursory look: I see that it has better handling of sort preserving merges, smarter push-down of sorts under unions, and adds support for sort merge joins. These are good the bits. The cons are that it seems to regress on some cases where it doing better before. Most of these argue that it does this to preserve global output ordering but I am not sure I agree with that. Anyway, we will disentangle and review in detail, but I want to give you a heads up that this will take some time. We will need to analyze every case carefully, go back to the old version of the code (and tests), compare and contrast etc. Before we form an idea on the merits of bottom-up vs. top-down, our goal will be to create *two functionally equal* implementations passing *exactly the same* test suite. Without that, it is not possible to objectively decide. Whatever the result on bottom-up vs. top-down is, I think this exercise will end up making the rule better, so that's great. I will keep you posted as we make progress in the upcoming days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
