[jira] [Commented] (ASTERIXDB-3208) Incorrect selectivity for array predicates

ASF subversion and git services (Jira) Mon, 10 Jul 2023 11:16:08 -0700


    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741728#comment-17741728
 ]


ASF subversion and git services commented on ASTERIXDB-3208:
------------------------------------------------------------

Commit 290d5374a80b9e62bfe01f87ed689a0513d603e7 in asterixdb's branch 
refs/heads/master from Vijay Sarathy
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=290d5374a8 ]

[ASTERIXDB-3208][COMP] Fix for array predicate selectivity

Change-Id: I890b5c2a32b583a8d6e1f23c5f27d2c912ce3ef9
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17626
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Vijay Sarathy <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
Tested-by: Jenkins <[email protected]>


> Incorrect selectivity for array predicates
> ------------------------------------------
>
>                 Key: ASTERIXDB-3208
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3208
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: COMP - Compiler
>    Affects Versions: 0.9.3
>            Reporter: Vijay Sarathy
>            Assignee: Vijay Sarathy
>            Priority: Major
>              Labels: triaged
>
> For the following CH2 query fragment:
> SELECT count(*) as revenue
> FROM   orders o, o.o_orderline ol
> WHERE  ol.ol_delivery_d  >= '2016-01-01 00:00:00.000000'
>   AND  ol.ol_delivery_d < '2017-01-01 00:00:00.000000';
> Cardinality of orders is 300K, there are an average of 10 orderlines per 
> orders, so the number of orderlines is ~3M.
> This query returns 320378 orderlines, so the selectivity (actual, not 
> estimated) of the predicate is 320378/3M = 0.106
> When we estimate the selectivity using samples, the sample query returns 922 
> docs, but since we create the sample on orders (whose sample size is 1063), 
> we compute the selectivity as 922/1063 = 0.867, which is clearly incorrect. 
> Since the selectivity is too high, we do not choose an index scan, which 
> leads to poor performance.
> The selectivity should be computed against the cardinality of the orderlines 
> in the sample which is 10630, so estimated selectivity should be 922/10630 = 
> 0.0867. When the query has an UNNEST operator, we need to account for an 
> "unnesting factor" in the sample size used for selectivity estimation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ASTERIXDB-3208) Incorrect selectivity for array predicates

Reply via email to