[jira] [Assigned] (PHOENIX-6832) Uncovered Global Secondary Indexes

Kadir Ozdemir (Jira) Sun, 13 Nov 2022 10:56:07 -0800


     [ 
https://issues.apache.org/jira/browse/PHOENIX-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kadir Ozdemir reassigned PHOENIX-6832:
--------------------------------------

    Assignee: Kadir Ozdemir

> Uncovered Global Secondary Indexes
> ----------------------------------
>
>                 Key: PHOENIX-6832
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6832
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Assignee: Kadir Ozdemir
>            Priority: Major
>
> An index can be called a covered index if the index cannot serve a query 
> alone. The sole purpose of an uncovered index would be identifying the data 
> table rows to be scanned for the query.  This implies that the DDL for an 
> uncovered index does not have the INCLUDE clause. 
> Then an index is called a covered index if the index can serve a query alone. 
> Please note that a covered index does not mean that it can cover all queries. 
> It just means that it can cover a query. A covered index can still cover some 
> queries even if the index DDL does not have the INCLUDE clause. This is 
> because a given query may reference only PK and/or indexed columns, and thus 
> a covered index without any included columns can serve this query by itself 
> (i.e.,  without joining index rows with data table rows). Another use case 
> for covered indexes without included columns is the count(*) queries. 
> Currently Phoenix uses indexes for count(*) queries by default.
> Since uncovered indexes will be used to identify data table rows affected by 
> a given query and the column values will be picked up from the data table, we 
> can provide a solution that is much simpler than the solution for covered 
> indexes by taking the advantage of the fact that the data table is the source 
> of truth, and an index table is used to only map secondary keys to the 
> primary keys to eliminate full table scans. The correctness of such a 
> solution is ensured if for every data table row, there exists an index row. 
> Then our solution to update the data tables and their indexes in a consistent 
> fashion for global secondary indexes would be a two-phase update approach, 
> where we first insert the index table rows, and only if they are successful, 
> then we update the data table rows. 
> This approach does not require reading the existing data table rows which is 
> currently required for covered indexes. Also, it does not require two-phase 
> commit writes for updating and maintaining global secondary index table rows. 
> Eliminating a data table read operation and an RPC call to update the index 
> row verification status on the corresponding index row would cut down index 
> write latency overhead by at least 50% for global uncovered indexes when 
> compared to global covered indexes. This is because global covered indexes 
> require one data table read and two index write operations for every data 
> table update whereas global uncovered indexes would require only one index 
> write. For batch writes, the expected performance and latency improvement 
> would be much higher than 50% since a batch of random row updates would not 
> anymore require random seeks on the data table for reading existing data 
> table rows.
> PHOENIX-6458, PHOENIX-6501 and PHOENIX-6663 improve the performance and 
> efficiency of joining index rows with their data table rows when a covered 
> index cannot cover a given query. We can further leverage it to support 
> uncovered indexes. 
> The uncovered indexes would be a significant performance improvement for 
> write intensive workloads. Also a common use case where uncovered indexes 
> will be desired is the upsert select use case on the data table, where a 
> subset of rows are updated in a batch. In this use case, the select query 
> performance is greatly improved via a covered index but the upsert part 
> suffers due to the covered index write overhead especially when the selected 
> data table rows are not consecutively stored on disk which is the most common 
> case.
> As mentioned before, the DDL for index creation does not include the INCLUDE 
> clause. We can add the UNCOVERED keyword to indicate the index to be created 
> is an uncovered index, for example, CREATE UNCOVERED INDEX. 
> As in the case of covered indexes, we can do read repair for uncovered 
> indexes too. The difference is that instead of using the verify status for 
> index rows, we would check if the corresponding data table row exists for a 
> given index row. Since we would always retrieve the data table rows to join 
> back with index rows for uncovered indexes, the read repair cost would occur 
> only for deleting invalid index rows. Also, the existing index reverse 
> verification and repair feature supported by IndexTool can be used to do bulk 
> repair operations from time to time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (PHOENIX-6832) Uncovered Global Secondary Indexes

Reply via email to