Paul Rogers created IMPALA-8042:
-----------------------------------
Summary: Better selectivity estimate for BETWEEN
Key: IMPALA-8042
URL: https://issues.apache.org/jira/browse/IMPALA-8042
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
The analyzer rewrites a BETWEEN expression into a pair of inequalities.
IMPALA-8037 explains that the planner then groups all such non-quality
conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains that
the analyzer should handle inequalities better.
BETWEEN is a special case and informs the final result. If we assume a
selectivity of s for inequality, then BETWEEN should be something like s/2. The
intuition is that if c >= x includes, say, ⅓ of values, and c <= y includes a
third of values, then c BETWEEN x AND y should be a narrower set of values, say
⅙.
[Ramakrishnan an
Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\
recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the
general expression x <= c AND c <= Y. Note the discrepancy between the compound
inequality case and the BETWEEN case, likely reflecting the additional
information we obtain when the user chooses to use BETWEEN.
To implement a special BETWEEN selectivity in Impala, we must remember the
selectivity of BETWEEN during the rewrite to a compound inequality.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]