[ 
https://issues.apache.org/jira/browse/IMPALA-13542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Riza Suminto updated IMPALA-13542:
----------------------------------
    Description: 
In my recent perf-AB-test, I found that the 10% default selectivity can regress 
output cardinality estimation of aggregation node that has a HAVING predicate.
https://gerrit.cloudera.org/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test#138

Attached are performance_result.txt and some profiles for comparison.

This was not an issue until now since NDV based cardinality estimation often 
overestimate already, such that the 10% default selectivity still results in 
higher estimate compared to the actual runtime cardinality. But as cardinality 
estimate gets better, this 10% default selectivity  can in-turn cause an 
underestimation. We should consider raising the default selectivity higher than 
10% for HAVING predicates. 50% might be better.

  was:
In my recent perf-AB-test, I found that the 10% default selectivity can regress 
output cardinality estimation of aggregation node that has a HAVING predicate.
https://gerrit.cloudera.org/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test#138

Attached are performance_result.txt and some profiles for comparison.

This was not an issue until now since NDV based cardinality estimation ofter 
overestimate already, such that the 10% default selectivity still results in 
higher estimate compared to the actual runtime cardinality. But as cardinality 
estimate gets better, this 10% default selectivity  can in-turn cause an 
underestimation. We should consider raising the default selectivity higher than 
10% for HAVING predicates.


> Raise default selectivity for HAVING predicates
> -----------------------------------------------
>
>                 Key: IMPALA-13542
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13542
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Riza Suminto
>            Priority: Major
>         Attachments: TPCDS-Q11_iter007-baseline.txt, 
> TPCDS-Q11_iter007-test.txt, TPCDS-Q74_iter007-baseline.txt, 
> TPCDS-Q74_iter007-test.txt, performance_result.txt
>
>
> In my recent perf-AB-test, I found that the 10% default selectivity can 
> regress output cardinality estimation of aggregation node that has a HAVING 
> predicate.
> https://gerrit.cloudera.org/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test#138
> Attached are performance_result.txt and some profiles for comparison.
> This was not an issue until now since NDV based cardinality estimation often 
> overestimate already, such that the 10% default selectivity still results in 
> higher estimate compared to the actual runtime cardinality. But as 
> cardinality estimate gets better, this 10% default selectivity  can in-turn 
> cause an underestimation. We should consider raising the default selectivity 
> higher than 10% for HAVING predicates. 50% might be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to