[
https://issues.apache.org/jira/browse/IMPALA-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers reassigned IMPALA-8035:
-----------------------------------
Assignee: (was: Paul Rogers)
> Planner estimate incorrect for non-equi-join case
> -------------------------------------------------
>
> Key: IMPALA-8035
> URL: https://issues.apache.org/jira/browse/IMPALA-8035
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.1.0
> Reporter: Paul Rogers
> Priority: Major
>
> The code in {{JoinNode.getJoinCardinality()}} makes a bold (and incorrect)
> assumption:
> {code:java}
> if (eqJoinConjunctSlots.isEmpty()) {
> // There are no eligible equi-join conjuncts. Optimistically assume
> FK/PK with a
> // join selectivity of 1.
> return probeCard;
> }
> {code}
> Suppose we have a join of the form:
> {code:sql}
> SELECT * FROM t1, t2
> {code}
> Or
> {code:sql}
> SELECT *
> FROM t1, t2
> WHERE t1.a > t2.b
> {code}
> The code assumes that each t1 row will match just one t2 row, which seems
> very unlikely.
> In fact, there are well-known algorithms to estimate this case. The first
> example is a Cartesian product with cardinality {{|T1| * |T2|}}.
> The second uses the selectivity of the expression:
> {noformat}
> |T1 ⋈ T2| = |T1| * |T2| * sel(T1.a > T2.b)
> {noformat}
> Without a histogram, we cannot obtain an accurate estimation of the
> cardinality (but see IMPALA-8032). But, we can assume that there is some
> reduction, else the user would not have included the clause. Most systems
> assume a value of 0.1 or 0.45 for inequality. See IMPALA-7601 and
> [Ramakrishnan and
> Gehrke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html].
> However, as noted in IMPALA-7601, we don’t accurately estimate the
> selectivity of inequalities, so work is needed there also.
> The reason that this bug does not cause problems for users is that,
> presumably, most real-world queries use eqi-joins. However, some kinds of
> analysis queries use other predicates, and Impala should support these use
> cases.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]