[
https://issues.apache.org/jira/browse/IMPALA-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers reassigned IMPALA-8038:
-----------------------------------
Assignee: (was: Paul Rogers)
> Incorrect estimation of selectivity of OR expression
> ----------------------------------------------------
>
> Key: IMPALA-8038
> URL: https://issues.apache.org/jira/browse/IMPALA-8038
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.1.0
> Reporter: Paul Rogers
> Priority: Major
>
> Suppose we have a query in which we include an OR expression:
> {code:sql}
> select *
> from functional.alltypestiny t
> where t.id = 10
> or t.bool_col = true
> {code}
> We can compute the join cardinality, {{|join|}} by observing that:
> * Either id will match,
> * Or, month will match,
> * Or, both will match.
> The number of matches is given as {{NDV(col)/|T|}}, and:
> * {{|T| = 8}}
> * {{NDV(id) = |T| = 8}}
> * {{NDV(bool_col) = 2}}
> * {{|T 𝜎 id=10| = |T|/NDV(id) = 1}}
> * {{|T 𝜎 bool_col=true| = |T|/NDV(bool_col) = 4}}
> So, if one or the other (or both) columns match, this means that we get
> either 1 row or 4 rows. So, the estimated cardinality should be the maximum
> of the two terms:
> {noformat}
> |bool_col| = NDV(bool_col)
> |id| = NDV(id)
> |join| = max( |T|/|id|, |T|/|bool_col| )
> = max(1, 4)
> = 4
> {noformat}
> Let's see what we actually get:
> {noformat}
> PLAN-ROOT SINK
> | mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> 00:SCAN HDFS [functional.alltypestiny t]
> partitions=4/4 files=4 size=460B
> predicates: t.id = CAST(10 AS INT) OR t.bool_col = TRUE
> tuple-ids=0 row-size=89B cardinality=5
> {noformat}
> So, looks like the code uses {{|id| + |bool_col|}}, not the correct {{max(
> |id|, |bool_col| )}}.
> While the error here is small, it can be larger in other cases. Bad errors
> lead to bad join estimation, leading to bad joins, leading to poor
> performance, leading to unhappy users.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]