[ 
https://issues.apache.org/jira/browse/ASTERIXDB-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Maxon updated ASTERIXDB-3208:
---------------------------------
    Labels: triaged  (was: )

> Incorrect selectivity for array predicates
> ------------------------------------------
>
>                 Key: ASTERIXDB-3208
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3208
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: COMP - Compiler
>    Affects Versions: 0.9.3
>            Reporter: Vijay Sarathy
>            Assignee: Vijay Sarathy
>            Priority: Major
>              Labels: triaged
>
> For the following CH2 query fragment:
> SELECT count(*) as revenue
> FROM   orders o, o.o_orderline ol
> WHERE  ol.ol_delivery_d  >= '2016-01-01 00:00:00.000000'
>   AND  ol.ol_delivery_d < '2017-01-01 00:00:00.000000';
> Cardinality of orders is 300K, there are an average of 10 orderlines per 
> orders, so the number of orderlines is ~3M.
> This query returns 320378 orderlines, so the selectivity (actual, not 
> estimated) of the predicate is 320378/3M = 0.106
> When we estimate the selectivity using samples, the sample query returns 922 
> docs, but since we create the sample on orders (whose sample size is 1063), 
> we compute the selectivity as 922/1063 = 0.867, which is clearly incorrect. 
> Since the selectivity is too high, we do not choose an index scan, which 
> leads to poor performance.
> The selectivity should be computed against the cardinality of the orderlines 
> in the sample which is 10630, so estimated selectivity should be 922/10630 = 
> 0.0867. When the query has an UNNEST operator, we need to account for an 
> "unnesting factor" in the sample size used for selectivity estimation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to