alamb commented on issue #17719: URL: https://github.com/apache/datafusion/issues/17719#issuecomment-3380847139
> [@alamb](https://github.com/alamb) Do you have an intuition whether Join ordering should be done as a LogicaPlan optimization, PhysicalPlan optimization or during physical planning? I think the intuition is that the best join order is typically based mostly on estimated cardinality (you want to plan the most selective joins first) which is a function of predicates and join order, rather than physical characteristics of the plan (e.g. the join algorithms used) That being said, I have definitely seen queries where a slightly less optimal join order is better for some reason (e.g. it keeps the data sorted so you can use a MergeJoin rather than a HashJoin), so I think there is room for discussion here Creating a JoinGraph structure for DataFusion's `ExecutionPlan` rather than `LogicalPlan` I think is definitely worth considering, especially since, as you say, the current APIs have much more information and cardinality estimation is currently done at the Physical level 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
