[jira] [Updated] (PHOENIX-7473) Eliminating index maintenance for CDC index

Viraj Jasani (Jira) Fri, 13 Dec 2024 13:16:28 -0800


     [ 
https://issues.apache.org/jira/browse/PHOENIX-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Viraj Jasani updated PHOENIX-7473:
----------------------------------
    Fix Version/s: 5.3.0

> Eliminating index maintenance for CDC index
> -------------------------------------------
>
>                 Key: PHOENIX-7473
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7473
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Assignee: Kadir Ozdemir
>            Priority: Major
>             Fix For: 5.3.0
>
>
> A CDC index is a log of row keys for each data table row mutations. These 
> index rows are ordered by mutation timestamp for a given row. This index is 
> used for capturing recent changes to a data table. It only stores the changes 
> for the max lookback period. It is only used for CDC queries.
> For regular indexes we do index maintenance such that if a data table row is 
> deleted, we also delete the corresponding index row. This is especially 
> needed for covered indexes for correctness as we use the index alone to serve 
> the queries.
> For uncovered indexes, this delete is not necessary for correctness but 
> needed for performance reason not to scan deleted rows again and again, and 
> not to attempt to scan the corresponding deleted data table rows. However, 
> none of these reasons are really applicable to CDC indexes. Since CDC index 
> table rows expires quickly we do not really need to delete them. It is also 
> expected that a CDC index row is scanned once.
> For CDC indexes we add an extra delete markers for each deleted row to have 
> two delete markers, one with the embedded row timestamp value that is equal 
> to the delete operation timestamp and the other with the embedded row 
> timestamp value that is equal to the latest put operation timestamp of this 
> row. However, we need only the former one for including the delete operation 
> in the correct order.
> For the same reasons, we do not need to read repair index rows for CDC 
> indexes either. The read repair is done currently to delete the orphan index 
> rows. These rows happens if index put succeed but the corresponding data put 
> does not. 
> Eliminating index maintenance will improve index performance and simplify its 
> code. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (PHOENIX-7473) Eliminating index maintenance for CDC index

Reply via email to