[
https://issues.apache.org/jira/browse/IMPALA-13542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Riza Suminto updated IMPALA-13542:
----------------------------------
Description:
In my recent perf-AB-test, I found that the 10% default selectivity can regress
output cardinality estimation of aggregation node that has a HAVING predicate.
https://gerrit.cloudera.org/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test#138
Attached are performance_result.txt and some profiles for comparison.
This was not an issue until now since NDV based cardinality estimation often
overestimate already, such that the 10% default selectivity still results in
higher estimate compared to the actual runtime cardinality. But as cardinality
estimate gets better, this 10% default selectivity can in-turn cause an
underestimation. We should consider raising the default selectivity higher than
10% for HAVING predicates. 50% might be better.
was:
In my recent perf-AB-test, I found that the 10% default selectivity can regress
output cardinality estimation of aggregation node that has a HAVING predicate.
https://gerrit.cloudera.org/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test#138
Attached are performance_result.txt and some profiles for comparison.
This was not an issue until now since NDV based cardinality estimation ofter
overestimate already, such that the 10% default selectivity still results in
higher estimate compared to the actual runtime cardinality. But as cardinality
estimate gets better, this 10% default selectivity can in-turn cause an
underestimation. We should consider raising the default selectivity higher than
10% for HAVING predicates.
> Raise default selectivity for HAVING predicates
> -----------------------------------------------
>
> Key: IMPALA-13542
> URL: https://issues.apache.org/jira/browse/IMPALA-13542
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Riza Suminto
> Priority: Major
> Attachments: TPCDS-Q11_iter007-baseline.txt,
> TPCDS-Q11_iter007-test.txt, TPCDS-Q74_iter007-baseline.txt,
> TPCDS-Q74_iter007-test.txt, performance_result.txt
>
>
> In my recent perf-AB-test, I found that the 10% default selectivity can
> regress output cardinality estimation of aggregation node that has a HAVING
> predicate.
> https://gerrit.cloudera.org/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test#138
> Attached are performance_result.txt and some profiles for comparison.
> This was not an issue until now since NDV based cardinality estimation often
> overestimate already, such that the 10% default selectivity still results in
> higher estimate compared to the actual runtime cardinality. But as
> cardinality estimate gets better, this 10% default selectivity can in-turn
> cause an underestimation. We should consider raising the default selectivity
> higher than 10% for HAVING predicates. 50% might be better.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]