The reason that there are two strategies is because of large joins. If your 
query joins 10 tables, the number of possible join orders is large (bounded by 
10 factorial I believe) and therefore would overwhelm the Volcano planner, 
which must construct each possibility. 

Therefore we have a heuristic algorithm that you should use for large joins. We 
gather the entire FROM clause into a data structure called MultiJoin, and a 
single rule call applies heuristics and spits out a join order that is probably 
close to optimal. 

When you are optimizing a query, you need to know whether you are in danger of 
being swallowed by the monster that is the complexity of large joins. If your 
query only joins 2 or 3 tables (and in some other situations too) you are not 
in danger and can safely exhaustively enumerate plans. 

> On Jul 3, 2023, at 7:58 AM, Jonathan Sternberg <[email protected]> wrote:
> 
> Hi,
> 
> I'm presently working on optimizing the ordering of joins for queries and
> had a few questions about the optimal way to do that with Calcite.
> 
> I watched this meetup video (https://www.youtube.com/watch?v=5wQojihyJDs)
> and spent some time experimenting with JoinAssociateRule, JoinCommuteRule,
> and the rules related to MultiJoins. We're utilizing the volcano planner
> for optimization at the present moment but also have the freedom to
> customize the order and phases for the planner phases.
> 
> 1. Is MultiJoin generally suggested over JoinAssociate and JoinCommute
> rules? Or are JoinAssociate and JoinCommute still recommended as the
> standard way to handle reordering of joins?
> 2. Our system only supports performing the join over two inputs and we
> can't support MultiJoin as a physical operation. My understanding is that
> the LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule will rearrange the
> join but will still produce a MultiJoin. What's the appropriate way to
> convert a MultiJoin back to a set of joins?
> 3. My understanding is that MultiJoin rules aren't compatible with the
> volcano planner and should be run as part of a stage using the heuristic
> planner. Is this understanding correct?
> 
> Thank you for any help.
> 
> --Jonathan Sternberg

Reply via email to