ozankabak commented on PR #5290:
URL: 
https://github.com/apache/arrow-datafusion/pull/5290#issuecomment-1432298878

   The PR is very large in scope. It changes parts of the old code (and 
certainly makes some changes to its tests), and also adds new code (and new 
tests). It would be much easier to review this if it were broken down to two 
PRs, where the first one only replicates the current functionality, has no 
functionality regressions, and does not change any tests at all; with the 
second PR adding new functionality. Right now, the new rule is significantly 
longer than the old rule (which is bad), but it offers more functionality 
(which is great). So is switching from bottom-up to top-down a good change or a 
bad change? We can't tell easily.
   
   Now, let me share my (very) preliminary impression so far after a cursory 
look: I see that it has better handling of sort preserving merges, smarter 
push-down of sorts under unions, and adds support for sort merge joins. These 
are good the bits. The cons are that it seems to regress on some cases where it 
doing better before. Most of these argue that it does this to preserve global 
output ordering but I am not sure I agree with that.
   
   Anyway, we will disentangle and review in detail, but I want to give you a 
heads up that this will take some time. We will need to analyze every case 
carefully, go back to the old version of the code (and tests), compare and 
contrast etc. Before we form an idea on the merits of bottom-up vs. top-down, 
our goal will be to create *two functionally equal* implementations passing 
*exactly the same* test suite. Without that, it is not possible to objectively 
decide.
   
   Whatever the result on bottom-up vs. top-down is, I think this exercise will 
end up making the rule better, so that's great. I will keep you posted as we 
make progress in the upcoming days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to