[ 
https://issues.apache.org/jira/browse/IMPALA-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948206#comment-16948206
 ] 

ASF subversion and git services commented on IMPALA-9015:
---------------------------------------------------------

Commit 0dff5ef3621a0be907c6968096b643a5b2d9f30c in impala's branch 
refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0dff5ef ]

IMPALA-9015: improve mt_dop scan scheduling

Implement longest-processing time algorithm for assigning scan ranges
to instances within a host. This is a standard algorithm that works
well in practice and solves some specific bugs in the current
algorithm.

The previous approach tended to assign multiple ranges to the
first instance and induce skew. E.g. if the ranges were
[3, 4, 5, 6] and it had 4 instances, it would assign
[3, 4], [5], [6], []. This also had the unfortunate consequence
that not all instances actually got allocated scan ranges,
making scheduling hard to reason about.

Testing:
Added a unit test for the core algorithm that checks directly
that it fixes that above problems.

Perf:
The algorithm is O(n log n) instead of O(n), where n is the
number of scan ranges allocated to a backend. This seems
worthwhile to get more even work distribution because
the payoff can be significant in may cases.

I ran a single node perf run of TPC-H scale 30 with
mt_dop=4 and saw significant perf improvements:

Report Generated on 2019-10-07
Run Description: "ccd741856d0e18be9c89087da71d9fc59f1c75ad vs 
37d3df99e588b0e9080660c472c3bb06a08361ac"

Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version:          impalad version 3.4.0-SNAPSHOT RELEASE ()
Baseline Impala Version: impalad version 3.4.0-SNAPSHOT RELEASE ()

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 8.51    | -3.78%     | 5.70       | -4.07% 
        |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | 
Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | 
Tval   |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+
| TPCH(30) | TPCH-Q11 | parquet / none / none | 3.08   | 3.04        |   +1.11% 
  |   0.72%   |   1.09%        | 5     |   +1.52%       | 0.58    | 1.88   |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 1.83   | 1.83        |   -0.01% 
  |   3.68%   |   1.25%        | 5     |   +2.00%       | 0.00    | -0.00  |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 36.29  | 36.17       |   +0.32% 
  |   2.22%   |   2.94%        | 5     |   +0.56%       | 0.00    | 0.20   |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 5.80   | 5.93        |   -2.06% 
  |   0.61%   |   0.94%        | 5     |   -1.74%       | -2.02   | -4.14  |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 8.91   | 9.15        |   -2.62% 
  |   3.54%   |   3.96%        | 5     |   -2.17%       | -0.87   | -1.12  |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.38   | 3.46        |   -2.30% 
  |   0.82%   |   0.68%        | 5     |   -2.54%       | -2.31   | -4.89  |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.62   | 1.67        |   -2.73% 
  |   2.52%   |   0.12%        | 5     |   -2.88%       | -1.15   | -2.49  |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 3.67   | 3.78        |   -3.06% 
  |   1.12%   |   1.69%        | 5     |   -2.69%       | -2.31   | -3.41  |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 3.08   | 3.18        |   -3.05% 
  |   1.22%   |   0.32%        | 5     |   -2.86%       | -2.31   | -5.57  |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 5.52   | 5.72        |   -3.56% 
  |   1.35%   |   1.02%        | 5     |   -3.60%       | -2.31   | -4.81  |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 2.32   | 2.41        |   -3.86% 
  |   1.04%   |   1.64%        | 5     |   -3.99%       | -2.02   | -4.50  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 24.22  | 25.27       |   -4.15% 
  |   0.40%   |   2.01%        | 5     |   -3.74%       | -2.31   | -4.55  |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 8.64   | 8.99        |   -3.97% 
  |   1.02%   |   0.47%        | 5     |   -4.27%       | -2.31   | -8.15  |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 3.38   | 3.53        |   -4.28% 
  |   1.67%   |   1.39%        | 5     |   -4.26%       | -2.31   | -4.51  |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 11.77  | 12.28       |   -4.18% 
  |   0.72%   |   0.94%        | 5     |   -4.62%       | -2.31   | -8.06  |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 7.65   | 8.05        |   -4.98% 
  |   1.03%   |   2.09%        | 5     |   -4.68%       | -2.31   | -4.82  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 27.97  | 29.52       | I -5.27% 
  |   0.78%   |   0.39%        | 5     | I -5.16%       | -2.31   | -14.17 |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 4.92   | 5.24        | I -6.05% 
  |   3.59%   |   2.32%        | 5     | I -6.79%       | -1.73   | -3.30  |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 4.56   | 4.92        | I -7.33% 
  |   1.55%   |   0.90%        | 5     | I -8.50%       | -2.31   | -9.64  |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 2.09   | 2.28        | I -8.40% 
  |   1.95%   |   1.85%        | 5     | I -9.21%       | -2.31   | -7.29  |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 10.83  | 11.84       | I -8.55% 
  |   0.53%   |   0.57%        | 5     | I -9.34%       | -2.31   | -25.56 |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 5.63   | 6.22        | I -9.53% 
  |   0.75%   |   0.73%        | 5     | I -10.46%      | -2.31   | -21.37 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+--------+

Change-Id: I45ed2dab835efeb64bb74891cb43065894892682
Reviewed-on: http://gerrit.cloudera.org:8080/14381
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Use better algorithm for allocating scan ranges to finstances within a daemon 
> in schedule 
> ------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-9015
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9015
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>             Fix For: Impala 3.4.0
>
>
> Currently the scheduler uses a single-pass algorithm to allocate scan ranges 
> to nodes. It has several deficiencies:
> * It doesn't guarantee that the desired number of instances are created. E.g. 
> if mt_dop is 4, there are 3 impalads and 9 scan ranges, it should create 3 
> instances per impala but doesn't reliably.
> * It tends to over-allocate to the first instances it visits.
> * The result depends quite a bit on the input order of the scan ranges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to