Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-2805: Order conjuncts based on selectivity and cost ......................................................................
Patch Set 8: I had already run the benchmark three time on essentially identical code (I was really confused by the initial results I was getting before Tim pointed out I was comparing to the wrong baseline, and was checking if it was reproducible). The results for the two runs that I hadn't posted before: +------------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +------------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ | TPCH(_300) | TPCH-Q11 | parquet / none / none | 7.01 | 6.58 | +6.56% | 5.16% | 1.53% | 1 | 3 | | TPCH(_300) | TPCH-Q12 | parquet / none / none | 17.29 | 16.38 | +5.59% | 4.08% | 4.59% | 1 | 3 | | TPCH(_300) | TPCH-Q16 | parquet / none / none | 10.19 | 9.76 | +4.49% | 0.92% | 0.58% | 1 | 3 | | TPCH(_300) | TPCH-Q2 | parquet / none / none | 9.09 | 8.72 | +4.33% | 3.03% | 0.74% | 1 | 3 | | TPCH(_300) | TPCH-Q15 | parquet / none / none | 6.30 | 6.08 | +3.75% | 3.19% | 0.44% | 1 | 3 | | TPCH(_300) | TPCH-Q9 | parquet / none / none | 41.65 | 40.63 | +2.51% | 3.62% | 1.81% | 1 | 3 | | TPCH(_300) | TPCH-Q4 | parquet / none / none | 34.64 | 33.84 | +2.38% | 0.28% | 1.39% | 1 | 3 | | TPCH(_300) | TPCH-Q21 | parquet / none / none | 122.32 | 119.52 | +2.34% | 0.68% | 0.67% | 1 | 3 | | TPCH(_300) | TPCH-Q20 | parquet / none / none | 16.13 | 16.01 | +0.72% | 1.50% | 0.09% | 1 | 3 | | TPCH(_300) | TPCH-Q7 | parquet / none / none | 48.68 | 48.96 | -0.55% | 5.07% | 1.29% | 1 | 3 | | TPCH(_300) | TPCH-Q1 | parquet / none / none | 19.13 | 19.24 | -0.57% | 0.07% | 0.82% | 1 | 3 | | TPCH(_300) | TPCH-Q10 | parquet / none / none | 20.94 | 21.08 | -0.65% | 0.48% | 0.13% | 1 | 3 | | TPCH(_300) | TPCH-Q8 | parquet / none / none | 8.88 | 8.95 | -0.78% | 1.79% | 2.20% | 1 | 3 | | TPCH(_300) | TPCH-Q5 | parquet / none / none | 48.77 | 49.31 | -1.10% | 0.10% | 1.37% | 1 | 3 | | TPCH(_300) | TPCH-Q18 | parquet / none / none | 83.98 | 86.03 | -2.39% | 1.24% | 0.83% | 1 | 3 | | TPCH(_300) | TPCH-Q19 | parquet / none / none | 155.42 | 160.06 | -2.90% | 0.04% | 4.56% | 1 | 3 | | TPCH(_300) | TPCH-Q17 | parquet / none / none | 119.79 | 124.53 | -3.80% | 3.50% | 4.26% | 1 | 3 | | TPCH(_300) | TPCH-Q13 | parquet / none / none | 34.31 | 35.98 | -4.62% | 1.30% | 2.35% | 1 | 3 | | TPCH(_300) | TPCH-Q22 | parquet / none / none | 7.86 | 8.26 | -4.92% | 3.56% | 4.04% | 1 | 3 | | TPCH(_300) | TPCH-Q3 | parquet / none / none | 32.46 | 34.31 | -5.40% | 0.06% | 3.31% | 1 | 3 | | TPCH(_300) | TPCH-Q14 | parquet / none / none | 6.01 | 6.44 | -6.79% | 2.61% | 2.01% | 1 | 3 | | TPCH(_300) | TPCH-Q6 | parquet / none / none | 3.49 | 3.87 | -9.82% | 0.80% | 2.61% | 1 | 3 | +------------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ +------------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ | Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +------------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ | TPCH(_300) | TPCH-Q17 | parquet / none / none | 133.67 | 124.53 | +7.34% | 3.14% | 4.26% | 1 | 3 | | TPCH(_300) | TPCH-Q12 | parquet / none / none | 16.95 | 16.38 | +3.48% | 4.37% | 4.59% | 1 | 3 | | TPCH(_300) | TPCH-Q16 | parquet / none / none | 10.07 | 9.76 | +3.26% | 0.10% | 0.58% | 1 | 3 | | TPCH(_300) | TPCH-Q21 | parquet / none / none | 123.07 | 119.52 | +2.97% | 0.51% | 0.67% | 1 | 3 | | TPCH(_300) | TPCH-Q4 | parquet / none / none | 34.76 | 33.84 | +2.73% | 0.76% | 1.39% | 1 | 3 | | TPCH(_300) | TPCH-Q20 | parquet / none / none | 16.29 | 16.01 | +1.71% | 1.13% | 0.09% | 1 | 3 | | TPCH(_300) | TPCH-Q6 | parquet / none / none | 3.91 | 3.87 | +1.26% | 6.51% | 2.61% | 1 | 3 | | TPCH(_300) | TPCH-Q8 | parquet / none / none | 9.01 | 8.95 | +0.66% | 5.57% | 2.20% | 1 | 3 | | TPCH(_300) | TPCH-Q9 | parquet / none / none | 40.63 | 40.63 | -0.00% | 1.55% | 1.81% | 1 | 3 | | TPCH(_300) | TPCH-Q11 | parquet / none / none | 6.57 | 6.58 | -0.15% | 0.00% | 1.53% | 1 | 3 | | TPCH(_300) | TPCH-Q1 | parquet / none / none | 19.11 | 19.24 | -0.64% | 0.68% | 0.82% | 1 | 3 | | TPCH(_300) | TPCH-Q18 | parquet / none / none | 85.26 | 86.03 | -0.90% | 0.24% | 0.83% | 1 | 3 | | TPCH(_300) | TPCH-Q2 | parquet / none / none | 8.62 | 8.72 | -1.05% | 1.10% | 0.74% | 1 | 3 | | TPCH(_300) | TPCH-Q15 | parquet / none / none | 6.01 | 6.08 | -1.16% | 4.28% | 0.44% | 1 | 3 | | TPCH(_300) | TPCH-Q5 | parquet / none / none | 48.43 | 49.31 | -1.78% | 0.30% | 1.37% | 1 | 3 | | TPCH(_300) | TPCH-Q3 | parquet / none / none | 33.34 | 34.31 | -2.83% | 2.10% | 3.31% | 1 | 3 | | TPCH(_300) | TPCH-Q7 | parquet / none / none | 46.77 | 48.96 | -4.46% | 6.47% | 1.29% | 1 | 3 | | TPCH(_300) | TPCH-Q10 | parquet / none / none | 20.10 | 21.08 | -4.64% | 0.66% | 0.13% | 1 | 3 | | TPCH(_300) | TPCH-Q19 | parquet / none / none | 151.34 | 160.06 | -5.45% | 0.73% | 4.56% | 1 | 3 | | TPCH(_300) | TPCH-Q14 | parquet / none / none | 5.99 | 6.44 | -7.02% | 0.34% | 2.01% | 1 | 3 | | TPCH(_300) | TPCH-Q22 | parquet / none / none | 7.62 | 8.26 | -7.84% | 1.01% | 4.04% | 1 | 3 | | TPCH(_300) | TPCH-Q13 | parquet / none / none | 29.96 | 35.98 | -16.73% | 1.08% | 2.35% | 1 | 3 | +------------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------------+-------+ There's only one query that is both consistently slower and is actually affected by this change, and that's Q16. We change the order of the conjuncts from: predicates: p_brand != 'Brand#45', NOT p_type LIKE 'MEDIUM POLISHED%', p_size IN (49, 14, 23, 45, 19, 3, 36, 9) to: predicates: p_size IN (49, 14, 23, 45, 19, 3, 36, 9), p_brand != 'Brand#45', NOT p_type LIKE 'MEDIUM POLISHED%' Our cost model actually calculates the IN conjunct as being more expensive than the other conjuncts, but we end up with this ordering because the IN conjunct has a selectivity estimate and the other two don't, so while this may be a real regression its hard to fix without either changing our treatment of conjuncts that do or don't have selectivities or adding calculation of selectivities for more types of Exprs. -- To view, visit http://gerrit.cloudera.org:8080/2598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02279a26fbc6308ac5eb819d78345fc010469034 Gerrit-PatchSet: 8 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: No
