shahrzad shirazi created ASTERIXDB-3523:
-------------------------------------------

             Summary: Eliminating Non-Matching Secondary Keys After Secondary 
Index Search
                 Key: ASTERIXDB-3523
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-3523
             Project: Apache AsterixDB
          Issue Type: Improvement
          Components: COMP - Compiler
            Reporter: shahrzad shirazi


In lower- or upper-bounded range queries, especially when data is heterogeneous 
or contains many null values, a secondary index search can return numerous 
records that don’t ultimately match the query conditions. These records proceed 
to the primary index search but are eliminated after the primary index search.

For example, consider the following queries on a *customers* dataset with a 
secondary index on the *age* field which is not in the datatype:

{*}Query 1{*}:
{code:java}
SELECT * FROM customers c WHERE c.age < 20; {code}
If many records have null or missing age values, the secondary index search 
will return numerous keys, which will pass through the primary index search but 
be filtered out afterward.

{*}Query 2{*}:
{code:java}
SELECT * FROM customers c WHERE c.age > 40; {code}
Similarly, if there are records with non-numeric values in the age field, these 
will be included in the secondary index results and pass through the primary 
index search but be filtered out afterward.

 

A solution to this inefficiency is to add a selection operator immediately 
after the secondary index search. This operator would filter out secondary keys 
that don’t meet the query criteria before they proceed to the primary index 
search, reducing unnecessary processing and improving overall efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to