[
https://issues.apache.org/jira/browse/ASTERIXDB-3208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741728#comment-17741728
]
ASF subversion and git services commented on ASTERIXDB-3208:
------------------------------------------------------------
Commit 290d5374a80b9e62bfe01f87ed689a0513d603e7 in asterixdb's branch
refs/heads/master from Vijay Sarathy
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=290d5374a8 ]
[ASTERIXDB-3208][COMP] Fix for array predicate selectivity
Change-Id: I890b5c2a32b583a8d6e1f23c5f27d2c912ce3ef9
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/17626
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Vijay Sarathy <[email protected]>
Reviewed-by: Ali Alsuliman <[email protected]>
Tested-by: Jenkins <[email protected]>
> Incorrect selectivity for array predicates
> ------------------------------------------
>
> Key: ASTERIXDB-3208
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-3208
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: COMP - Compiler
> Affects Versions: 0.9.3
> Reporter: Vijay Sarathy
> Assignee: Vijay Sarathy
> Priority: Major
> Labels: triaged
>
> For the following CH2 query fragment:
> SELECT count(*) as revenue
> FROM orders o, o.o_orderline ol
> WHERE ol.ol_delivery_d >= '2016-01-01 00:00:00.000000'
> AND ol.ol_delivery_d < '2017-01-01 00:00:00.000000';
> Cardinality of orders is 300K, there are an average of 10 orderlines per
> orders, so the number of orderlines is ~3M.
> This query returns 320378 orderlines, so the selectivity (actual, not
> estimated) of the predicate is 320378/3M = 0.106
> When we estimate the selectivity using samples, the sample query returns 922
> docs, but since we create the sample on orders (whose sample size is 1063),
> we compute the selectivity as 922/1063 = 0.867, which is clearly incorrect.
> Since the selectivity is too high, we do not choose an index scan, which
> leads to poor performance.
> The selectivity should be computed against the cardinality of the orderlines
> in the sample which is 10630, so estimated selectivity should be 922/10630 =
> 0.0867. When the query has an UNNEST operator, we need to account for an
> "unnesting factor" in the sample size used for selectivity estimation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)