[ https://issues.apache.org/jira/browse/IMPALA-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong reassigned IMPALA-2796: ------------------------------------- Assignee: (was: Alexander Behm) > Suboptimal join ordering due to greedy candidate selection. > ----------------------------------------------------------- > > Key: IMPALA-2796 > URL: https://issues.apache.org/jira/browse/IMPALA-2796 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 2.2, Impala 2.3.0 > Reporter: Alexander Behm > Priority: Minor > Labels: performance, planning > > Consider the join order that we generate for TPCH-Q8 after fixing IMPALA-976: > {code} > hash join *region* on n1.n_regionkey = r_regionkey > hash join nation n1 on c_nationkey = n1.n_nationkey > hash join customer o_custkey = c_custkey > hash join nation n2 s_nationkey = n2.c_nationkey > hash join supplier on l_suppkey = s_suppkey > hash join *order* l_orderkey = o_orderkey > hash join *part* l_partkey = p_partkey > lineitem > {code} > In the pseudo-plan above, the tables in bold have selective predicates > applied to the scans. > Contrast the plan above with the following optimal join order: > {code} > hash join nation n2 s_nationkey = n2.c_nationkey > hash join supplier on l_suppkey = s_suppkey > hash join *region* on n1.n_regionkey = r_regionkey > hash join nation n1 on c_nationkey = n1.n_nationkey > hash join customer o_custkey = c_custkey > hash join *order* l_orderkey = o_orderkey > hash join *part* l_partkey = p_partkey > lineitem > {code} > This plan is better because the number of intermediate results are reduced by > executing the join on region first. The difference between the two plans is > that the following series of joins are "swapped": > This series of joins leads up to a selective join with region, and should > come before the block with supplier and n2. > {code} > hash join *region* on n1.n_regionkey = r_regionkey > hash join nation n1 on c_nationkey = n1.n_nationkey > hash join customer o_custkey = c_custkey > {code} > These series of joins are not selective and should come last. > {code} > hash join nation n2 s_nationkey = n2.c_nationkey > hash join supplier on l_suppkey = s_suppkey > {code} > Our current join-ordering algorithm is not able to produce the optimal plan > because it greedily adds one join at a time. After it has constructed the > partial plan that joins lineitem,part,order, the algorithm considers customer > and supplier as candidates for the next join (only these tables are > candidates due to the applicable join predicates). Since the resulting join > cardinality for customer and supplier is estimated to be equal, we > "arbitrarily" pick one and continue. > We should improve our join ordering algorithm to generate the optimal join > order for TPCH-Q8. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org