[
https://issues.apache.org/jira/browse/PHOENIX-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani updated PHOENIX-7473:
----------------------------------
Fix Version/s: 5.3.0
> Eliminating index maintenance for CDC index
> -------------------------------------------
>
> Key: PHOENIX-7473
> URL: https://issues.apache.org/jira/browse/PHOENIX-7473
> Project: Phoenix
> Issue Type: Improvement
> Reporter: Kadir Ozdemir
> Assignee: Kadir Ozdemir
> Priority: Major
> Fix For: 5.3.0
>
>
> A CDC index is a log of row keys for each data table row mutations. These
> index rows are ordered by mutation timestamp for a given row. This index is
> used for capturing recent changes to a data table. It only stores the changes
> for the max lookback period. It is only used for CDC queries.
> For regular indexes we do index maintenance such that if a data table row is
> deleted, we also delete the corresponding index row. This is especially
> needed for covered indexes for correctness as we use the index alone to serve
> the queries.
> For uncovered indexes, this delete is not necessary for correctness but
> needed for performance reason not to scan deleted rows again and again, and
> not to attempt to scan the corresponding deleted data table rows. However,
> none of these reasons are really applicable to CDC indexes. Since CDC index
> table rows expires quickly we do not really need to delete them. It is also
> expected that a CDC index row is scanned once.
> For CDC indexes we add an extra delete markers for each deleted row to have
> two delete markers, one with the embedded row timestamp value that is equal
> to the delete operation timestamp and the other with the embedded row
> timestamp value that is equal to the latest put operation timestamp of this
> row. However, we need only the former one for including the delete operation
> in the correct order.
> For the same reasons, we do not need to read repair index rows for CDC
> indexes either. The read repair is done currently to delete the orphan index
> rows. These rows happens if index put succeed but the corresponding data put
> does not.
> Eliminating index maintenance will improve index performance and simplify its
> code.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)