The reason that there are two strategies is because of large joins. If your query joins 10 tables, the number of possible join orders is large (bounded by 10 factorial I believe) and therefore would overwhelm the Volcano planner, which must construct each possibility.
Therefore we have a heuristic algorithm that you should use for large joins. We gather the entire FROM clause into a data structure called MultiJoin, and a single rule call applies heuristics and spits out a join order that is probably close to optimal. When you are optimizing a query, you need to know whether you are in danger of being swallowed by the monster that is the complexity of large joins. If your query only joins 2 or 3 tables (and in some other situations too) you are not in danger and can safely exhaustively enumerate plans. > On Jul 3, 2023, at 7:58 AM, Jonathan Sternberg <[email protected]> wrote: > > Hi, > > I'm presently working on optimizing the ordering of joins for queries and > had a few questions about the optimal way to do that with Calcite. > > I watched this meetup video (https://www.youtube.com/watch?v=5wQojihyJDs) > and spent some time experimenting with JoinAssociateRule, JoinCommuteRule, > and the rules related to MultiJoins. We're utilizing the volcano planner > for optimization at the present moment but also have the freedom to > customize the order and phases for the planner phases. > > 1. Is MultiJoin generally suggested over JoinAssociate and JoinCommute > rules? Or are JoinAssociate and JoinCommute still recommended as the > standard way to handle reordering of joins? > 2. Our system only supports performing the join over two inputs and we > can't support MultiJoin as a physical operation. My understanding is that > the LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule will rearrange the > join but will still produce a MultiJoin. What's the appropriate way to > convert a MultiJoin back to a set of joins? > 3. My understanding is that MultiJoin rules aren't compatible with the > volcano planner and should be run as part of a stage using the heuristic > planner. Is this understanding correct? > > Thank you for any help. > > --Jonathan Sternberg
