Kadir Ozdemir created PHOENIX-7473:
--------------------------------------

             Summary: Eliminating index maintenance for CDC index
                 Key: PHOENIX-7473
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7473
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Kadir Ozdemir
            Assignee: Kadir Ozdemir


A CDC index is a log of row keys for each data table row mutations. These index 
rows are ordered by mutation timestamp for a given row. This index is used for 
capturing recent changes to a data table. It only stores the changes for the 
max lookback period. It is only used for CDC queries.

For regular indexes we do index maintenance such that if a data table row is 
deleted, we also delete the corresponding index row. This is especially needed 
for covered indexes for correctness as we use the index alone to serve the 
queries.

For uncovered indexes, this delete is not necessary for correctness but needed 
for performance reason not to scan deleted rows again and again, and not to 
attempt to scan the corresponding deleted data table rows. However, none of 
these reasons are really applicable to CDC indexes. Since CDC index table rows 
expires quickly we do not really delete them. It is also expected that an index 
row is scanned once.

In fact for CDC indexes we add an extra delete markers for each deleted row to 
have two delete markers, one with the embedded row timestamp value that is 
equal to the delete operation timestamp and the other with the embedded row 
timestamp value that is equal to the latest put operation timestamp of this 
row. However, we needs only former one for reporting the delete operation in 
the correct order.

For the same reasons, we do not need to read repair index rows either. The read 
repair is done currently to delete the orphan index rows. These rows happens if 
index put succeed but the corresponding data put does not. 

Eliminating index maintenance will improve index performance and simplify its 
code. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to