[
https://issues.apache.org/jira/browse/IMPALA-12006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aman Sinha resolved IMPALA-12006.
---------------------------------
Fix Version/s: Impala 4.3.0
Resolution: Fixed
> Outer/inner join cardinality highly overestimated
> -------------------------------------------------
>
> Key: IMPALA-12006
> URL: https://issues.apache.org/jira/browse/IMPALA-12006
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.2.0
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Priority: Major
> Fix For: Impala 4.3.0
>
>
> In one of the use cases, we have seen the cardinality estimate for left outer
> join highly overestimated. The plan is complex and only a partial output is
> shown below (with the column names anonymized):
> {noformat}
> 57:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
> | hash-table-id=121
> | hash predicates: a.id = a.id
> | fk/pk conjuncts: none
> | mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB
> thread-reservation=0
> | tuple-ids=4N,3,6N,8N,9N,11N,14N,16N,19N,21N,24N,26N,29N,31N
> row-size=2.63KB cardinality=3.90T
> | in pipelines: 06(GETNEXT), 26(OPEN)
> |
> |--F1253:PLAN FRAGMENT hosts=13 instances=13
> | | Per-Instance Resources: mem-estimate=1.10GB mem-reservation=204.00MB
> thread-reservation=1
> | JOIN BUILD
> | | join-table-id=121 plan-id=122 cohort-id=25
> | | build expressions: a.id
> | | mem-estimate=1.08GB mem-reservation=204.00MB spill-buffer=2.00MB
> thread-reservation=0
> | |
> | 1758:EXCHANGE [BROADCAST]
> | | mem-estimate=20.87MB mem-reservation=0B thread-reservation=0
> | | tuple-ids=29,31 row-size=85B cardinality=9.56M
> | | in pipelines: 26(GETNEXT)
> | |
> ...
> ...
> ...
> 56:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
> | hash predicates: ifnull(a.id, a.id) = a.id
> | fk/pk conjuncts: assumed fk/pk
> | mem-estimate=0B mem-reservation=0B spill-buffer=2.00MB
> thread-reservation=0
> | tuple-ids=4N,3,6N,8N,9N,11N,14N,16N,19N,21N,24N,26N row-size=2.55KB
> cardinality=14.97G
> | in pipelines: 06(GETNEXT), 22(OPEN)
> {noformat}
> Note that the left input of the join is estimated as 14.97G rows, right input
> as 9.56M rows but the LOJ estimate is 3.9T rows. We need to investigate why
> that is so and fix it. The NDV of the based column involved in the join is
> 36661 but in the lower join there are functions involved in the join
> condition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]