[ 
https://issues.apache.org/jira/browse/IMPALA-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2796:
-------------------------------------

    Assignee:     (was: Alexander Behm)

> Suboptimal join ordering due to greedy candidate selection.
> -----------------------------------------------------------
>
>                 Key: IMPALA-2796
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2796
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.2, Impala 2.3.0
>            Reporter: Alexander Behm
>            Priority: Minor
>              Labels: performance, planning
>
> Consider the join order that we generate for TPCH-Q8 after fixing IMPALA-976:
> {code}
> hash join *region* on n1.n_regionkey = r_regionkey
> hash join nation n1 on c_nationkey = n1.n_nationkey
> hash join customer o_custkey = c_custkey
> hash join nation n2 s_nationkey = n2.c_nationkey
> hash join supplier on l_suppkey = s_suppkey
> hash join *order* l_orderkey = o_orderkey
> hash join *part* l_partkey = p_partkey
> lineitem
> {code}
> In the pseudo-plan above, the tables in bold have selective predicates 
> applied to the scans.
> Contrast the plan above with the following optimal join order:
> {code}
> hash join nation n2 s_nationkey = n2.c_nationkey
> hash join supplier on l_suppkey = s_suppkey
> hash join *region* on n1.n_regionkey = r_regionkey
> hash join nation n1 on c_nationkey = n1.n_nationkey
> hash join customer o_custkey = c_custkey
> hash join *order* l_orderkey = o_orderkey
> hash join *part* l_partkey = p_partkey
> lineitem
> {code}
> This plan is better because the number of intermediate results are reduced by 
> executing the join on region first. The difference between the two plans is 
> that the following series of joins are "swapped":
> This series of joins leads up to a selective join with region, and should 
> come before the block with supplier and n2.
> {code}
> hash join *region* on n1.n_regionkey = r_regionkey
> hash join nation n1 on c_nationkey = n1.n_nationkey
> hash join customer o_custkey = c_custkey
> {code}
> These series of joins are not selective and should come last.
> {code}
> hash join nation n2 s_nationkey = n2.c_nationkey
> hash join supplier on l_suppkey = s_suppkey
> {code}
> Our current join-ordering algorithm is not able to produce the optimal plan 
> because it greedily adds one join at a time. After it has constructed the 
> partial plan that joins lineitem,part,order, the algorithm considers customer 
> and supplier as candidates for the next join (only these tables are 
> candidates due to the applicable join predicates). Since the resulting join 
> cardinality for customer and supplier is estimated to be equal, we 
> "arbitrarily" pick one and continue.
> We should improve our join ordering algorithm to generate the optimal join 
> order for TPCH-Q8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to