Paul Rogers created IMPALA-8042:
-----------------------------------

             Summary: Better selectivity estimate for BETWEEN
                 Key: IMPALA-8042
                 URL: https://issues.apache.org/jira/browse/IMPALA-8042
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.1.0
            Reporter: Paul Rogers


The analyzer rewrites a BETWEEN expression into a pair of inequalities.  
IMPALA-8037 explains that the planner then groups all such non-quality 
conditions together and assigns a selectivity of 0.1. IMPALA-8031 explains that 
the analyzer should handle inequalities better.

BETWEEN is a special case and informs the final result. If we assume a 
selectivity of s for inequality, then BETWEEN should be something like s/2. The 
intuition is that if c >= x includes, say, ⅓ of values, and c <= y includes a 
third of values, then c BETWEEN x AND y should be a narrower set of values, say 
⅙.

[Ramakrishnan an 
Gherke|http://pages.cs.wisc.edu/~dbbook/openAccess/Minibase/optimizer/costformula.html\
 recommend 0.4 for between, 0.3 for inequality, and 0.3^2 = 0.09 for the 
general expression x <= c AND c <= Y. Note the discrepancy between the compound 
inequality case and the BETWEEN case, likely reflecting the additional 
information we obtain when the user chooses to use BETWEEN.

To implement a special BETWEEN selectivity in Impala, we must remember the 
selectivity of BETWEEN during the rewrite to a compound inequality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to