[
https://issues.apache.org/jira/browse/PHOENIX-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kadir Ozdemir updated PHOENIX-7032:
-----------------------------------
Description:
The secondary indexes supported in Phoenix have been full indexes such that for
every data table row there is an index row. Generating an index row for every
data table row is not always required. For example, some use cases do not
require index rows for the data table rows in which indexed column values are
null. Such indexes are called sparse indexes. Partial indexes generalize the
concept of sparse indexing and allow users to specify the subset of the data
table rows for which index rows will be maintained. This subset is specified
using a WHERE clause added to the CREATE INDEX DDL statement.
Partial secondary indexes were first proposed by Michael Stonebraker
[here|https://dsf.berkeley.edu/papers/ERL-M89-17.pdf]. Since then several SQL
databases (e.g.,
[Postgres|https://www.postgresql.org/docs/current/indexes-partial.html] and
[SQLite|https://www.sqlite.org/partialindex.html]) and NoSQL databases (e.g.,
[MongoDB|https://www.mongodb.com/docs/manual/core/index-partial/]) have
supported some form of partial indexes. It is challenging to allow arbitrary
WHERE clauses in DDL statements. For example, Postgres does not allow
subqueries in these where clauses and SQLite supports much more restrictive
where clauses.
Supporting arbitrary where clauses creates challenges for query optimizers in
deciding the usability of a partial index for a given query. If the set of data
table rows that satisfy the query is a subset of the data table rows that the
partial index points back, then the query can use the index. Thus, the query
optimizer has to decide if the WHERE clause of the query implies the WHERE
clause of the index.
Michael Stonebraker [here|https://dsf.berkeley.edu/papers/ERL-M89-17.pdf]
suggests that an index WHERE clause is a conjunct of simple terms, i.e:
i-clause-1 and i-clause-2 and ... and i-clause-m where each clause is of the
form <column> <operator> <constant>. Hence, the qualification can be evaluated
for each tuple in the indicated relation without consulting additional tuples.
Phoenix partial indexes will initially support a more general set of index
WHERE clauses that can be evaluated on a single row with the following
exceptions
* Subqueries are not allowed.
* Like expressions are allowed with very limited support such that an index
WHERE clause with like expressions can imply/contain a query if the query has
the same like expressions that the index WHERE clause has.
* Comparison between columns are allowed without supporting transitivity, for
example, a > b and b > c does not imply a > c.
Partial indexes will be supported initially for global secondary indexes, i.e.,
covered global indexes and uncovered global indexes. The local secondary
indexes will be supported in future.
was:
The secondary indexes supported in Phoenix have been full indexes such that for
every data table row there is an index row. Generating an index row for every
data table row is not always required. For example, some use cases do not
require index rows for the data table rows in which indexed column values are
null. Such indexes are called sparse indexes. Partial indexes generalize the
concept of sparse indexing and allow users to specify the subset of the data
table rows for which index rows will be maintained. This subset is specified
using a WHERE clause added to the CREATE INDEX DDL statement.
Partial secondary indexes were first proposed by Michael Stonebraker
[here|https://dsf.berkeley.edu/papers/ERL-M89-17.pdf]. Since then several SQL
databases (e.g.,
[Postgres|https://www.postgresql.org/docs/current/indexes-partial.html] and
[SQLite|https://www.sqlite.org/partialindex.html]) and NoSQL databases (e.g.,
[MongoDB|https://www.mongodb.com/docs/manual/core/index-partial/]) have
supported some form of partial indexes. It is challenging to allow arbitrary
WHERE clauses in DDL statements. For example, Postgres does not allow
subqueries in these where clauses and SQLite supports much more restrictive
where clauses.
Supporting arbitrary where clauses creates challenges for query optimizers in
deciding the usability of a partial index for a given query. If the set of data
table rows that satisfy the query is a subset of the data table rows that the
partial index points back, then the query can use the index. Thus, the query
optimizer has to decide if the WHERE clause of the query implies the WHERE
clause of the index.
Michael Stonebraker [here|https://dsf.berkeley.edu/papers/ERL-M89-17.pdf]
suggests that an index WHERE clause is a conjunct of simple terms, i.e:
i-clause-1 and i-clause-2 and ... and i-clause-m where each clause is of the
form field operator constant. Hence, the qualification can be evaluated for
each tuple in the indicated relation without consulting additional tuples.
The first implementation of Phoenix partial indexes will support a more general
set of index WHERE clauses where simple terms each of which is in the form
<column name> <operator> <constant> are connected through any combination of
AND and OR operators. Formally, an allowed index WHERE clause can be
represented by any expression tree such that non-leaf nodes are AND, OR, or
NOT operators, and leaf nodes are simple terms each of which is in the form
<column name> <operator> <constant> where a column is a data table column, an
operator is a comparison operator, and a constant is a value from the domain of
the column.
Partial indexes will be supported for all index types, that is, for local
indexes, covered global indexes and uncovered global indexes.
> Partial Global Secondary Indexes
> --------------------------------
>
> Key: PHOENIX-7032
> URL: https://issues.apache.org/jira/browse/PHOENIX-7032
> Project: Phoenix
> Issue Type: New Feature
> Reporter: Kadir Ozdemir
> Assignee: Kadir Ozdemir
> Priority: Major
>
> The secondary indexes supported in Phoenix have been full indexes such that
> for every data table row there is an index row. Generating an index row for
> every data table row is not always required. For example, some use cases do
> not require index rows for the data table rows in which indexed column values
> are null. Such indexes are called sparse indexes. Partial indexes generalize
> the concept of sparse indexing and allow users to specify the subset of the
> data table rows for which index rows will be maintained. This subset is
> specified using a WHERE clause added to the CREATE INDEX DDL statement.
> Partial secondary indexes were first proposed by Michael Stonebraker
> [here|https://dsf.berkeley.edu/papers/ERL-M89-17.pdf]. Since then several SQL
> databases (e.g.,
> [Postgres|https://www.postgresql.org/docs/current/indexes-partial.html] and
> [SQLite|https://www.sqlite.org/partialindex.html]) and NoSQL databases
> (e.g., [MongoDB|https://www.mongodb.com/docs/manual/core/index-partial/])
> have supported some form of partial indexes. It is challenging to allow
> arbitrary WHERE clauses in DDL statements. For example, Postgres does not
> allow subqueries in these where clauses and SQLite supports much more
> restrictive where clauses.
> Supporting arbitrary where clauses creates challenges for query optimizers in
> deciding the usability of a partial index for a given query. If the set of
> data table rows that satisfy the query is a subset of the data table rows
> that the partial index points back, then the query can use the index. Thus,
> the query optimizer has to decide if the WHERE clause of the query implies
> the WHERE clause of the index.
> Michael Stonebraker [here|https://dsf.berkeley.edu/papers/ERL-M89-17.pdf]
> suggests that an index WHERE clause is a conjunct of simple terms, i.e:
> i-clause-1 and i-clause-2 and ... and i-clause-m where each clause is of the
> form <column> <operator> <constant>. Hence, the qualification can be
> evaluated for each tuple in the indicated relation without consulting
> additional tuples.
> Phoenix partial indexes will initially support a more general set of index
> WHERE clauses that can be evaluated on a single row with the following
> exceptions
> * Subqueries are not allowed.
> * Like expressions are allowed with very limited support such that an index
> WHERE clause with like expressions can imply/contain a query if the query has
> the same like expressions that the index WHERE clause has.
> * Comparison between columns are allowed without supporting transitivity,
> for example, a > b and b > c does not imply a > c.
> Partial indexes will be supported initially for global secondary indexes,
> i.e., covered global indexes and uncovered global indexes. The local
> secondary indexes will be supported in future.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)