ozankabak commented on PR #5171: URL: https://github.com/apache/arrow-datafusion/pull/5171#issuecomment-1421228337
I don't think this is a bug. Let's think about the converse scenario: The non-optimized query could have produced the other order (which would be valid), or the user could have changed the order of columns, and in that case we would have the illusion of preserving "the order" during optimization. In general, whenever there are multiple possibilities for what constitutes a valid query, there will always be some configurations where non-optimized plans and optimized plans differ (or agree) in under-constrained aspects. At the end of day, the optimizer's prime job is to end up with more efficient plans that obey the specification, not to conform to arbitrary behaviors of the non-optimized plan. In this case, there is simply no order in the specification, so I don't see a bug. The result is indeed correct. This being said, I think I understand the general suggestion you are making: In my words, I would put it this way: When there are multiple _equivalent_ optimizations, it is a good idea to choose the one that resembles the non-optimized query the most. I agree with this, and making progress towards this desiderate in follow-ons, refactors etc. would be very nice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org