[
https://issues.apache.org/jira/browse/IMPALA-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers reassigned IMPALA-8031:
-----------------------------------
Assignee: (was: Paul Rogers)
> Remove redundant inequalities for selectivity calcs
> ---------------------------------------------------
>
> Key: IMPALA-8031
> URL: https://issues.apache.org/jira/browse/IMPALA-8031
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 3.1.0
> Reporter: Paul Rogers
> Priority: Minor
>
> IMPALA-8035 describes how Impala currently estimates inequality: lump all
> non-equality predicates together an assume a single 0.1 selectivity for the
> whole group. As we try to fix that, we hit another issue. The bug here
> assumes we are treating inequality correctly on a per-predicate basis.
> If a query has two inequalities on the same column, and they are of the same
> “direction”, then only the one with the larger (or smaller) applies.
> Selectivity estimates should reflect this fact.
> {noformat}
> select *
> from tpch.customer c
> where c.c_custkey < 1234
> and c.c_custkey < 2345
> ---- PLAN
> PLAN-ROOT SINK
> |
> 00:SCAN HDFS [tpch.customer c]
> partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=28.44K
> predicates: c.c_custkey < 1234, c.c_custkey < 2345
> {noformat}
> Expected:
> {noformat}
> 00:SCAN HDFS [tpch.customer c]
> partitions=1/1 files=1 size=23.08MB row-size=218B cardinality=49.50K
> {noformat}
> The calcs don't even need to do the math. Just noticing two expressions in
> the same direction is sufficient: count only one of them toward overall
> selectivity; doesn't matter which one.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]