[ 
https://issues.apache.org/jira/browse/IMPALA-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5612.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0


IMPALA-5612: join inversion should factor in parallelism

The join inversion optimisation did not factor in the degree of
parallelism that the join executed with after inversion. In some cases
this lead to bad decisions, e.g. executing a join on a single node
instead of 20 nodes.

This patch adds a more sophisticated cost model that factors degree
of parallelism into the join inversion decision.

The behaviour is unchanged if inversion does not change the degree of
parallelism.

Perf:
Ran cluster TPC-H and TPC-DS benchmarks. Average changes were small:
< 3%. Saw a mix of improvements and regressions. We were satisfied
that the regressions were cases when the planner "got lucky" previously.
E.g. on TPC-H Q2 a join was flipped to put lineitem on the left as a
result of inaccurate cardinality estimates.

Mostafa also ran a TPC-DS benchmark where the dimension tables were
loaded with num_nodes=1 to minimise the number of files. We saw some
huge speedups there on the unmodified queries, e.g. TPCDS-Q10 went from
291s to 32.25s. The worst percentage regression was Q50, which went
from 1.61s to 2.4s and the worst absolute regression was Q72, which
went from 694s to 874s (25%).

Change-Id: Icacea4565ce25ef15aaab014684c9440dd501d4e
Reviewed-on: http://gerrit.cloudera.org:8080/7351
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins

> Join inversion should avoid reducing the degree of parallelism
> --------------------------------------------------------------
>
>                 Key: IMPALA-5612
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5612
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
>            Reporter: Alexander Behm
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: performance, planner, ramp-up
>             Fix For: Impala 2.10.0
>
>
> The degree of inter-node parallelism for a join is determined by its left 
> input, so when inverting a join the planner should be mindful of how the 
> inversion affects parallelism.
> For example, the left join input may have been reduced by joining with 
> several dimension tables so much that it becomes smaller than the right 
> hand-side (another small dimension table). By inverting such a join the 
> degree of parallelism may be reduced to one or very few nodes, based on how 
> many nodes the right-hand size is executed on.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to