Tim Armstrong has posted comments on this change.
Change subject: IMPALA-3286: prefetching for PartitionedAggregationNode
......................................................................
Patch Set 5:
Rebased onto the latest phj patch.
I did some benchmarking on a slightly earlier version. Overall trend is that
high-ndv aggs are much faster, and low-ndv aggs are slightly slower. On
end-to-end tests this seems to give a small net win:
Run Description: "Base: 7ad3faa4e3fa5b55b84ae3b2888caa3e4bdf8238 vs
Ref: ded7fb79caf0466a22ab64cad18998f73cb38f3d"
Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version: impalad version 2.6.0-cdh5-INTERNAL RELEASE ()
Baseline Impala Version: impalad version 2.6.0-cdh5-INTERNAL RELEASE ()
+--------------------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) |
GeoMean(s) | Delta(GeoMean) |
+--------------------+-----------------------+---------+------------+------------+----------------+
| TARGETED-PERF(_20) | parquet / none / none | 12.59 | -16.86% |
7.48 | -7.39% |
+--------------------+-----------------------+---------+------------+------------+----------------+
+--------------------+---------------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload | Query | File
Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base
StdDev(%) | Num Clients | Iters |
+--------------------+---------------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TARGETED-PERF(_20) | primitive_groupby_decimal_lowndv.test | parquet
/ none / none | 2.27 | 1.96 | +15.61% | 1.36% | 1.95%
| 1 | 20 |
| TARGETED-PERF(_20) | primitive_groupby_bigint_lowndv | parquet
/ none / none | 2.27 | 1.99 | +13.81% | 2.17% | 2.87%
| 1 | 20 |
| TARGETED-PERF(_20) | primitive_groupby_bigint_pk | parquet
/ none / none | 35.71 | 42.77 | -16.51% | 6.66% | 0.96%
| 1 | 20 |
| TARGETED-PERF(_20) | primitive_groupby_bigint_highndv | parquet
/ none / none | 10.17 | 12.27 | -17.11% | 1.70% | 0.65%
| 1 | 20 |
| TARGETED-PERF(_20) | primitive_groupby_decimal_highndv | parquet
/ none / none | 12.55 | 16.74 | I -25.04% | 2.48% | 1.57%
| 1 | 20 |
+--------------------+---------------------------------------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
(I) Improvement: TARGETED-PERF(_20) primitive_groupby_decimal_highndv
[parquet / none / none] (16.74s -> 12.55s [-25.04%])
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
| Operator | % of Query | Avg | Base Avg | Delta(Avg) |
StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Rows | Est #Rows |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
| 03:AGGREGATE | 2.62% | 330.04ms | 321.08ms | +2.79% |
6.01% | 379.54ms | 343.90ms | +10.36% | 1 | 0 | 174.13K |
| 01:AGGREGATE | 94.94% | 11.96s | 16.19s | -26.14% |
2.61% | 12.66s | 17.13s | -26.10% | 1 | 1.78M | 1.74M |
| 00:SCAN HDFS | 2.32% | 291.90ms | 270.24ms | +8.01% |
1.31% | 299.88ms | 313.65ms | -4.39% | 1 | 119.99M | 119.99M |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+---------+-----------+
Report Generated on 2016-05-16
Run Description: "Base: 7ad3faa4e3fa5b55b84ae3b2888caa3e4bdf8238 vs
Ref: ded7fb79caf0466a22ab64cad18998f73cb38f3d"
Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version: impalad version 2.6.0-cdh5-INTERNAL RELEASE ()
Baseline Impala Version: impalad version 2.6.0-cdh5-INTERNAL RELEASE ()
+-----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s)
| Delta(GeoMean) |
+-----------+-----------------------+---------+------------+------------+----------------+
| TPCH(_20) | parquet / none / none | 9.45 | -1.54% | 6.38
| -1.31% |
+-----------+-----------------------+---------+------------+------------+----------------+
+-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| Workload | Query | File Format | Avg(s) | Base Avg(s) |
Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
| TPCH(_20) | TPCH-Q1 | parquet / none / none | 12.36 | 11.66 |
+6.02% | 1.25% | 1.56% | 1 | 20 |
| TPCH(_20) | TPCH-Q17 | parquet / none / none | 14.11 | 13.62 |
+3.60% | 2.28% | 2.04% | 1 | 20 |
| TPCH(_20) | TPCH-Q5 | parquet / none / none | 6.37 | 6.16 |
+3.35% | 2.71% | 1.75% | 1 | 20 |
| TPCH(_20) | TPCH-Q3 | parquet / none / none | 5.04 | 4.91 |
+2.70% | 2.26% | 2.13% | 1 | 20 |
| TPCH(_20) | TPCH-Q15 | parquet / none / none | 5.01 | 4.95 |
+1.36% | 2.31% | 2.39% | 1 | 20 |
| TPCH(_20) | TPCH-Q12 | parquet / none / none | 4.25 | 4.20 |
+1.15% | 2.12% | 2.21% | 1 | 20 |
| TPCH(_20) | TPCH-Q14 | parquet / none / none | 3.46 | 3.42 |
+0.94% | 2.43% | 2.15% | 1 | 20 |
| TPCH(_20) | TPCH-Q4 | parquet / none / none | 4.02 | 3.98 |
+0.90% | * 23.81% * | * 24.67% * | 1 | 20 |
| TPCH(_20) | TPCH-Q19 | parquet / none / none | 47.18 | 46.78 |
+0.86% | 2.02% | 1.74% | 1 | 20 |
| TPCH(_20) | TPCH-Q20 | parquet / none / none | 3.84 | 3.81 |
+0.77% | 2.23% | 2.07% | 1 | 20 |
| TPCH(_20) | TPCH-Q7 | parquet / none / none | 16.90 | 16.81 |
+0.51% | 0.94% | 0.85% | 1 | 20 |
| TPCH(_20) | TPCH-Q11 | parquet / none / none | 1.55 | 1.54 |
+0.50% | 2.79% | 2.10% | 1 | 20 |
| TPCH(_20) | TPCH-Q21 | parquet / none / none | 22.66 | 22.61 |
+0.24% | 0.67% | 0.69% | 1 | 20 |
| TPCH(_20) | TPCH-Q9 | parquet / none / none | 13.37 | 13.34 |
+0.24% | 0.52% | 0.59% | 1 | 20 |
| TPCH(_20) | TPCH-Q10 | parquet / none / none | 6.51 | 6.53 |
-0.42% | 1.71% | 1.03% | 1 | 20 |
| TPCH(_20) | TPCH-Q16 | parquet / none / none | 2.22 | 2.23 |
-0.45% | 2.62% | 2.08% | 1 | 20 |
| TPCH(_20) | TPCH-Q8 | parquet / none / none | 6.68 | 6.71 |
-0.50% | 3.96% | 5.01% | 1 | 20 |
| TPCH(_20) | TPCH-Q22 | parquet / none / none | 2.84 | 2.86 |
-0.81% | 2.44% | 2.61% | 1 | 20 |
| TPCH(_20) | TPCH-Q6 | parquet / none / none | 2.28 | 2.31 |
-1.30% | 1.41% | 1.28% | 1 | 20 |
| TPCH(_20) | TPCH-Q2 | parquet / none / none | 2.46 | 2.58 |
-4.73% | 1.72% | 2.31% | 1 | 20 |
| TPCH(_20) | TPCH-Q18 | parquet / none / none | 15.66 | 17.31 |
-9.53% | 4.42% | 3.51% | 1 | 20 |
| TPCH(_20) | TPCH-Q13 | parquet / none / none | 9.19 | 12.86 |
I -28.57% | 2.64% | 4.52% | 1 | 20 |
+-----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
(I) Improvement: TPCH(_20) TPCH-Q13 [parquet / none / none] (12.86s ->
9.19s [-28.57%])
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+
| Operator | % of Query | Avg | Base Avg | Delta(Avg) |
StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Rows | Est #Rows |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+
| 04:AGGREGATE | 2.17% | 208.45ms | 252.62ms | -17.48% |
9.35% | 254.88ms | 299.81ms | -14.99% | 1 | 45 | 2.98M |
| 03:AGGREGATE | 44.10% | 4.24s | 7.79s | -45.57% |
3.59% | 4.62s | 8.42s | -45.08% | 1 | 3.00M | 2.98M |
| 02:HASH JOIN | 40.34% | 3.88s | 4.05s | -4.08% |
3.20% | 4.21s | 4.48s | -5.99% | 1 | 30.68M | 3.00M |
| 06:EXCHANGE | 3.67% | 353.15ms | 293.33ms | +20.39% |
5.24% | 399.72ms | 302.01ms | +32.35% | 1 | 29.68M | 3.00M |
| 01:SCAN HDFS | 8.08% | 776.89ms | 782.61ms | -0.73% |
4.39% | 857.62ms | 900.48ms | -4.76% | 1 | 29.68M | 3.00M |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+--------+-----------+
(V) Significant Variability: TPCH(_20) TPCH-Q4 [parquet / none / none]
(24.67% -> 23.81%)
+--------------+------------+-----------+----------------+------------------+--------+-------+-----------+
| Operator | % of Query | StdDev(%) | Base StdDev(%) |
Delta(StdDev(%)) | #Hosts | #Rows | Est #Rows |
+--------------+------------+-----------+----------------+------------------+--------+-------+-----------+
| 08:AGGREGATE | 3.51% | 20.08% | 18.81% | +6.74%
| 1 | 5 | 5 |
| 03:AGGREGATE | 9.21% | 12.80% | 13.21% | -3.09%
| 1 | 5 | 5 |
| 02:HASH JOIN | 19.68% | 35.76% | 32.07% | +11.51%
| 1 | 1.05M | 3.00M |
| 00:SCAN HDFS | 17.27% | 14.81% | 15.14% | -2.21%
| 1 | 1.15M | 3.00M |
| 05:EXCHANGE | 4.31% | 60.05% | 58.61% | +2.45%
| 1 | 6.02M | 12.00M |
+--------------+------------+-----------+----------------+------------------+--------+-------+-----------+
--
To view, visit http://gerrit.cloudera.org:8080/3070
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I7726454efb416d61080c4e11db0ee7ada18c149b
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: No