Re: Optimal way to organize Joins in Calcite

Julian Hyde Mon, 03 Jul 2023 08:49:21 -0700

The reason that there are two strategies is because of large joins. If your 
query joins 10 tables, the number of possible join orders is large (bounded by 
10 factorial I believe) and therefore would overwhelm the Volcano planner, 
which must construct each possibility.


Therefore we have a heuristic algorithm that you should use for large joins. We 
gather the entire FROM clause into a data structure called MultiJoin, and a 
single rule call applies heuristics and spits out a join order that is probably 
close to optimal. 

When you are optimizing a query, you need to know whether you are in danger of 
being swallowed by the monster that is the complexity of large joins. If your 
query only joins 2 or 3 tables (and in some other situations too) you are not 
in danger and can safely exhaustively enumerate plans. 

> On Jul 3, 2023, at 7:58 AM, Jonathan Sternberg <[email protected]> wrote:
> 
> Hi,
> 
> I'm presently working on optimizing the ordering of joins for queries and
> had a few questions about the optimal way to do that with Calcite.
> 
> I watched this meetup video (https://www.youtube.com/watch?v=5wQojihyJDs)
> and spent some time experimenting with JoinAssociateRule, JoinCommuteRule,
> and the rules related to MultiJoins. We're utilizing the volcano planner
> for optimization at the present moment but also have the freedom to
> customize the order and phases for the planner phases.
> 
> 1. Is MultiJoin generally suggested over JoinAssociate and JoinCommute
> rules? Or are JoinAssociate and JoinCommute still recommended as the
> standard way to handle reordering of joins?
> 2. Our system only supports performing the join over two inputs and we
> can't support MultiJoin as a physical operation. My understanding is that
> the LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule will rearrange the
> join but will still produce a MultiJoin. What's the appropriate way to
> convert a MultiJoin back to a set of joins?
> 3. My understanding is that MultiJoin rules aren't compatible with the
> volcano planner and should be run as part of a stage using the heuristic
> planner. Is this understanding correct?
> 
> Thank you for any help.
> 
> --Jonathan Sternberg

Re: Optimal way to organize Joins in Calcite

Reply via email to