I fixed one or two bugs, and now the algorithm says: store x (customer x (product_class x product x sales)).
That also seems to me to be a reasonable plan. At this point I can’t say it’s optimal, even within the algorithm’s simplistic cost model and imperfect statistics. I’ve achieved my goal for this milestone, that the algorithm can generate bushy joins that aren’t completely crazy. So I’m going to check in (with the rule disabled by default, of course). Next step will be to run the rule multiple times with some random noise thrown into the cost model, see what alternatives it generates, and what their cost estimates are. With luck we’ll find that it gets close to optimal with little or no randomization. And it would be interesting at some point to run the various alternatives through Hive-Tez to see whether the “best plan” according to this algorithm’s cost model is the best plan in the real world. Julian On Jul 25, 2014, at 3:43 PM, Mostafa Mokhtar <[email protected]> wrote: > The algorithm is correct. > As we don't benefit much from carrying over the extra columns throughout > the join, since store is not expected to reduce the join output. > > On cn105 30TB TPC-DS Q17 the delta is 20% between the two plans, the > faster plan is the one where store and item are joined late in the plan. > > Check the email with title "Performance of Q17 for bushy plan rewrite on > CN105". > > Thanks > Mostafa > > > > On Fri, Jul 25, 2014 at 3:33 PM, Julian Hyde <[email protected]> wrote: > >> On Jul 25, 2014, at 3:14 PM, Mostafa Mokhtar <[email protected]> >> wrote: >> >> What would be the plan if we have this query? >> >> select * >> from sales as s >> join customer as c on s.customer_id = c.customer_id >> join product as p on s.product_id = p.product_id >> join product_class as pc on p.product_class_id = pc.product_class_id >> join store as st on s.store_id = st.store_id >> where c.city = ‘San Francisco' >> >> >> Where store doesn't have any filters and row count is 20. >> >> >> The algorithm is currently telling me >> >> (((product_class x product) >> x (customer x sales)) >> x store) >> >> My gut says >> >> (store >> x ((product_class x product) >> x (customer x sales))) >> >> What do you think? >> >> Julian >> > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.
