shahrzad shirazi created ASTERIXDB-3523:
-------------------------------------------
Summary: Eliminating Non-Matching Secondary Keys After Secondary
Index Search
Key: ASTERIXDB-3523
URL: https://issues.apache.org/jira/browse/ASTERIXDB-3523
Project: Apache AsterixDB
Issue Type: Improvement
Components: COMP - Compiler
Reporter: shahrzad shirazi
In lower- or upper-bounded range queries, especially when data is heterogeneous
or contains many null values, a secondary index search can return numerous
records that don’t ultimately match the query conditions. These records proceed
to the primary index search but are eliminated after the primary index search.
For example, consider the following queries on a *customers* dataset with a
secondary index on the *age* field which is not in the datatype:
{*}Query 1{*}:
{code:java}
SELECT * FROM customers c WHERE c.age < 20; {code}
If many records have null or missing age values, the secondary index search
will return numerous keys, which will pass through the primary index search but
be filtered out afterward.
{*}Query 2{*}:
{code:java}
SELECT * FROM customers c WHERE c.age > 40; {code}
Similarly, if there are records with non-numeric values in the age field, these
will be included in the secondary index results and pass through the primary
index search but be filtered out afterward.
A solution to this inefficiency is to add a selection operator immediately
after the secondary index search. This operator would filter out secondary keys
that don’t meet the query criteria before they proceed to the primary index
search, reducing unnecessary processing and improving overall efficiency.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)