Riza Suminto created IMPALA-13542:
-------------------------------------
Summary: Raise default selectivity for HAVING predicates
Key: IMPALA-13542
URL: https://issues.apache.org/jira/browse/IMPALA-13542
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Riza Suminto
Attachments: TPCDS-Q11_iter007-baseline.txt,
TPCDS-Q11_iter007-test.txt, TPCDS-Q74_iter007-baseline.txt,
TPCDS-Q74_iter007-test.txt, performance_result.txt
In my recent perf-AB-test, I found that the 10% default selectivity can regress
output cardinality estimation of aggregation node that has a HAVING predicate.
https://gerrit.cloudera.org/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test#138
Attached are performance_result.txt and some profiles for comparison.
This was not an issue until now since NDV based cardinality estimation ofter
overestimate already, such that the 10% default selectivity still results in
higher estimate compared to the actual runtime cardinality. But as cardinality
estimate gets better, this 10% default selectivity can in-turn cause an
underestimation. We should consider raising the default selectivity higher than
10% for HAVING predicates.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)