ozankabak commented on PR #5171:
URL: 
https://github.com/apache/arrow-datafusion/pull/5171#issuecomment-1421228337

   I don't think this is a bug. Let's think about the converse scenario: The 
non-optimized query could have produced the other order (which would be valid), 
or the user could have changed the order of columns, and in that case we would 
have the illusion of preserving "the order" during optimization. In general, 
whenever there are multiple possibilities for what constitutes a valid query, 
there will always be some configurations where non-optimized plans and 
optimized plans differ (or agree) in under-constrained aspects.
   
   At the end of day, the optimizer's prime job is to end up with more 
efficient plans that obey the specification, not to conform to arbitrary 
behaviors of the non-optimized plan. In this case, there is simply no order in 
the specification, so I don't see a bug. The result is indeed correct.
   
   This being said, I think I understand the general suggestion you are making: 
In my words, I would put it this way: When there are multiple _equivalent_ 
optimizations, it is a good idea to choose the one that resembles the 
non-optimized query the most. I agree with this, and making progress towards 
this desiderate in follow-ons, refactors etc. would be very nice.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to