mingmwang commented on PR #6009: URL: https://github.com/apache/arrow-datafusion/pull/6009#issuecomment-1508280788
> > And I do not think it is bug. > > I mean #5970 clearly shows that there is SOME bug in the DF code base and as illustrated there different parts of the code could be hold responsible for it. So while the technical bug is probably in the optimizer, the fact that `UnionExec` has its own internal optimizer and that people clearly don't know about it (also because it's not really documented) made me conclude that we should probably have a cleaner design instead of fixing more edge cases that are a consequence of the current design. I get your points. Agree that this internal optimization logic inside the UnionExec is not visible to others, this is not good. Basically, we need to consider the following requirements 1. Sometimes we would like the UnionExec to keep ordering 2. Sometimes we would like the UnionExec to keep partition 3. The physical plan should show or display the `UnionExec` is `partition-aware` or `ordering-aware` clearly. 4. The physical optimizers rules(Enforce sort/enforce distribution) need to handle all the different cases correctly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
