Bushy joins

Julian Hyde Fri, 25 Jul 2014 15:09:22 -0700

I want to make sure I’m getting my join inputs the right way round.

Suppose we have this query:


select *
from sales as s
join customer as c on s.customer_id = c.customer_id
join product as p on s.product_id = p.product_id
join product_class as pc on p.product_class_id = pc.product_class_id
where c.city = ‘San Francisco'

It’s a classic snowflake join:

  filtered-customer (1k) <= sales (86k) => product (1.5k) => product_class (110)

where “=>” means ‘many-to-one join’ and the numbers are row-counts.

Is this the best plan? (Each join node has the smaller input on the LHS.)

((scan(product_class) join scan(product)) join (filter(scan(customer)) join 
scan(sales)))

I’m close to producing this in OptimizeBushyJoinRule, but I want to check that 
it’s the right thing.

Julian

Bushy joins

Reply via email to