Hello Dan Hecht, Tim Armstrong,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/2896
to look at the new patch set (#10).
Change subject: IMPALA-3286: Software prefetching for hash table build.
......................................................................
IMPALA-3286: Software prefetching for hash table build.
This change pipelines the code which builds the hash table.
This is based on the idea which Mostafa presented earlier.
Essentially, the pipelined code will first evaluate all the
rows to be inserted, compute their hash values and prefetch
the corresponding hash table buckets before going through
all the rows again to insert them into the hash table. This
change also introduces lazy evaluation of the build side
expression in Equals() to avoid unnecessary build side
expression evaluation for the second time in case the hash
table bucket is empty or the hash doesn't match due to
collision.
With this change, the hash table build time of a self-join
with lineitem reduces by more than half (going from 10.5s to 4.5s).
The overall query time drops from 37.28s to 31.15s (~16% reduction).
select count(*) from lineitem o1, lineitem o2
where o1.l_orderkey = o2.l_orderkey and
o1.l_linenumber = o2.l_linenumber
TPCH(15) also improves by 2.5% overall, with certain queries
improving up to 8%:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) |
Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(15) | parquet / none / none | 14.34 | -2.49% | 9.36 | -1.65%
|
+----------+-----------------------+---------+------------+------------+----------------+
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| Workload | Query | File Format | Avg(s) | Base Avg(s) |
Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
| TPCH(15) | TPCH-Q1 | parquet / none / none | 8.44 | 8.05 | +4.92%
| 2.89% | 1.50% | 1 | 10 |
| TPCH(15) | TPCH-Q11 | parquet / none / none | 1.85 | 1.76 | +4.86%
| 3.88% | 3.93% | 1 | 10 |
| TPCH(15) | TPCH-Q2 | parquet / none / none | 2.90 | 2.78 | +4.41%
| 8.68% | * 15.78% * | 1 | 10 |
| TPCH(15) | TPCH-Q19 | parquet / none / none | 39.46 | 38.53 | +2.40%
| 2.21% | 2.23% | 1 | 10 |
| TPCH(15) | TPCH-Q16 | parquet / none / none | 1.90 | 1.86 | +1.81%
| 2.54% | 2.74% | 1 | 10 |
| TPCH(15) | TPCH-Q15 | parquet / none / none | 5.50 | 5.43 | +1.32%
| 2.62% | 3.34% | 1 | 10 |
| TPCH(15) | TPCH-Q6 | parquet / none / none | 3.03 | 3.01 | +0.61%
| 3.54% | 2.14% | 1 | 10 |
| TPCH(15) | TPCH-Q17 | parquet / none / none | 31.22 | 31.13 | +0.29%
| 0.32% | 0.49% | 1 | 10 |
| TPCH(15) | TPCH-Q14 | parquet / none / none | 3.63 | 3.64 | -0.21%
| 2.22% | 2.70% | 1 | 10 |
| TPCH(15) | TPCH-Q12 | parquet / none / none | 3.88 | 3.89 | -0.31%
| 1.90% | 1.82% | 1 | 10 |
| TPCH(15) | TPCH-Q7 | parquet / none / none | 26.25 | 26.64 | -1.50%
| 2.30% | 2.40% | 1 | 10 |
| TPCH(15) | TPCH-Q20 | parquet / none / none | 6.26 | 6.42 | -2.45%
| 1.44% | 1.81% | 1 | 10 |
| TPCH(15) | TPCH-Q9 | parquet / none / none | 30.56 | 31.43 | -2.77%
| 0.41% | 0.64% | 1 | 10 |
| TPCH(15) | TPCH-Q13 | parquet / none / none | 13.53 | 13.94 | -3.00%
| 1.02% | 0.50% | 1 | 10 |
| TPCH(15) | TPCH-Q8 | parquet / none / none | 24.93 | 25.76 | -3.22%
| 0.95% | 1.00% | 1 | 10 |
| TPCH(15) | TPCH-Q10 | parquet / none / none | 6.58 | 6.89 | -4.50%
| 1.37% | 1.24% | 1 | 10 |
| TPCH(15) | TPCH-Q18 | parquet / none / none | 31.44 | 33.12 | -5.05%
| 0.50% | 0.66% | 1 | 10 |
| TPCH(15) | TPCH-Q21 | parquet / none / none | 31.56 | 33.55 | -5.92%
| 4.31% | 5.01% | 1 | 10 |
| TPCH(15) | TPCH-Q22 | parquet / none / none | 4.17 | 4.44 | -5.98%
| 0.59% | 0.75% | 1 | 10 |
| TPCH(15) | TPCH-Q5 | parquet / none / none | 14.67 | 15.66 | -6.34%
| 8.08% | 1.13% | 1 | 10 |
| TPCH(15) | TPCH-Q3 | parquet / none / none | 11.25 | 12.01 | -6.38%
| 1.17% | 0.85% | 1 | 10 |
| TPCH(15) | TPCH-Q4 | parquet / none / none | 12.38 | 13.49 | -8.19%
| 1.44% | 0.70% | 1 | 10 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+
Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
---
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.h
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/runtime/row-batch.h
10 files changed, 190 insertions(+), 121 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/96/2896/10
--
To view, visit http://gerrit.cloudera.org:8080/2896
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib85e7fc162ad25c849b9e716b629e226697cd940
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>