alamb commented on issue #22098: URL: https://github.com/apache/datafusion/issues/22098#issuecomment-4431721313
> So what I'd like to discuss is if the hash join order should be decided based on a more complex heuristic. For example, "if the difference in size between the tables is less than X, go by row count, otherwise go by byte size". It appears that Postgres also does something like this: In general picking the right join order is a complex and multi-facted problem. It typically involves various heuristics, size and cardinality estimates, and many other things I am personally very skeptical that we can add more advanced heuristics that don't make the plans worse for some people (aka they will experience it as a regression). SO while this particular heuristic update looks reasonable I worry about unintended consequences. This is very much driven by my experience working with the Vertica optimizer where we had all sorts of challenges with complex join orders I think a better approach is to make the heuristic more tunable / pluggable so people can plug in whatever heustics they want. There is more backstory here: - https://github.com/apache/datafusion/issues/17718 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
