Kadir Ozdemir created PHOENIX-7032:
--------------------------------------

             Summary: Partial Secondary Indexes
                 Key: PHOENIX-7032
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7032
             Project: Phoenix
          Issue Type: New Feature
            Reporter: Kadir Ozdemir
            Assignee: Kadir Ozdemir


The secondary indexes supported in Phoenix have been full indexes such that for 
every data table row there is an index row. Generating an index row for every 
data table row is not always required. For example, some use cases do not 
require index rows for the data table rows in which indexed column values are 
null. Such indexes are called sparse indexes. Partial indexes generalize the 
concept of sparse indexing and allow users to specify the subset of the data 
table rows for which index rows will be maintained. This subset is specified 
using a WHERE clause added to the CREATE INDEX DDL statement.

Partial secondary indexes were first proposed by Michael Stonebraker 
[here|https://dsf.berkeley.edu/papers/ERL-M89-17.pdf]. Since then several SQL 
databases (e.g., 
[Postgres|https://www.postgresql.org/docs/current/indexes-partial.html] and 
[SQLite|https://www.sqlite.org/partialindex.html])  and NoSQL databases (e.g., 
[MongoDB|https://www.mongodb.com/docs/manual/core/index-partial/]) have 
supported some form of partial indexes. It is challenging to allow arbitrary 
WHERE clauses in DDL statements. For example, Postgres does not allow 
subqueries in these where clauses and SQLite supports much more restrictive 
where clauses. 

Supporting arbitrary where clauses creates challenges for query optimizers in 
deciding the usability of a partial index for a given query. If the set of data 
table rows that satisfy the query is a subset of the data table rows that the 
partial index points back, then the query can use the index. Thus, the query 
optimizer has to decide if the WHERE clause of the query implies the WHERE 
clause of the index. 

Michael Stonebraker [here|https://dsf.berkeley.edu/papers/ERL-M89-17.pdf] 
suggests that an index WHERE clause is a conjunct of simple terms, i.e: 
i-clause-1 and i-clause-2 and ... and i-clause-m where each clause is of the 
form field operator constant. Hence, the qualification can be evaluated for 
each tuple in the indicated relation without consulting additional tuples. 

The first implementation of Phoenix partial indexes will support a more general 
set of index WHERE clauses where simple terms each of which is in the form 
<column name>  <operator> <constant> are connected through any combination of 
AND and OR operators. Formally, an allowed index WHERE clause can be 
represented by any expression tree such that non-leaf nodes are AND, OR, or  
NOT operators, and leaf nodes are simple terms each of which is in the form 
<column name>  <operator> <constant> where  a column is a data table column, an 
operator is a comparison operator, and a constant is a value from the domain of 
the column.

Partial indexes will be supported for all index types, that is, for local 
indexes, covered global indexes and uncovered global indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to