[jira] [Commented] (IMPALA-13542) Raise default selectivity for HAVING predicates

ASF subversion and git services (Jira) Fri, 13 Dec 2024 16:00:49 -0800


    [ 
https://issues.apache.org/jira/browse/IMPALA-13542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905641#comment-17905641
 ]


ASF subversion and git services commented on IMPALA-13542:
----------------------------------------------------------

Commit 2828e473710cc246dbdeb9e1da772c45881cfddb in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2828e4737 ]

IMPALA-13480: VALIDATE_CARDINALITY in some aggregation tests

This patch enable VALIDATE_CARDINALITY test options in several planner
tests that touch aggregation node. Enabling it has revealed three bugs.

First, in IMPALA-13405, cardinality estimate of MERGE phase aggregation
is not capped against the output cardinality of the EXCHANGE node. This
patch fix it by adding such capping.

Second, tuple-based optimization IMPALA-13405 can cause cardinality
underestimation if HAVING predicate exist. This is due to the default
selectivity of 10% applied for each HAVING predicate. This patch skip
tuple-based optimization if AggregationNode.conjuncts_ is ever not
empty. It will stay skipped on stats recompute, even if conjuncts_ is
transfered into the next Merge AggregationNode above the plan. The
optimization skip causes following PlannerTest (under
testdata/workloads/functional-planner/queries/PlannerTest/) to revert
their cardinality estimation to their state pior to IMPALA-13405:
- tpcds/tpcds-q39a.test
- tpcds/tpcds-q39b.test
- tpcds_cpu_cost/tpcds-q39a.test
- tpcds_cpu_cost/tpcds-q39b.test
In the future, we should consider raising the default selectivity for
HAVING predicate and undo this skipping logic (IMPALA-13542).

Third, is missing stats recompute after conjunct transfer in multi-phase
aggregation. This will be fixed separately by IMPALA-13526.

Testing:
- Enable cardinality validation in testMultipleDistinct*
- Update aggregation.test to reflect current PlannerTest output.
  Added some test cases in aggregation.test.
- Run and pass TpcdsPlannerTest and TpcdsCpuPlannerTest.
- Selectively run some more planner tests that touch AggregationNode and
  pass them.

Change-Id: Iadb4af9fd65fdb85b66fae1e403ccec8ca5eb102
Reviewed-on: http://gerrit.cloudera.org:8080/22184
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Raise default selectivity for HAVING predicates
> -----------------------------------------------
>
>                 Key: IMPALA-13542
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13542
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Riza Suminto
>            Priority: Major
>         Attachments: TPCDS-Q11_iter007-baseline.txt, 
> TPCDS-Q11_iter007-test.txt, TPCDS-Q74_iter007-baseline.txt, 
> TPCDS-Q74_iter007-test.txt, performance_result.txt
>
>
> In my recent perf-AB-test, I found that the 10% default selectivity can 
> regress output cardinality estimation of aggregation node that has a HAVING 
> predicate.
> https://gerrit.cloudera.org/c/22032/2/testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test#138
> Attached are performance_result.txt and some profiles for comparison.
> This was not an issue until now since NDV based cardinality estimation often 
> overestimate already, such that the 10% default selectivity still results in 
> higher estimate compared to the actual runtime cardinality. But as 
> cardinality estimate gets better, this 10% default selectivity  can in-turn 
> cause an underestimation. We should consider raising the default selectivity 
> higher than 10% for HAVING predicates. 50% might be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-13542) Raise default selectivity for HAVING predicates

Reply via email to