[ https://issues.apache.org/jira/browse/PHOENIX-7473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Jasani updated PHOENIX-7473: ---------------------------------- Fix Version/s: 5.3.0 > Eliminating index maintenance for CDC index > ------------------------------------------- > > Key: PHOENIX-7473 > URL: https://issues.apache.org/jira/browse/PHOENIX-7473 > Project: Phoenix > Issue Type: Improvement > Reporter: Kadir Ozdemir > Assignee: Kadir Ozdemir > Priority: Major > Fix For: 5.3.0 > > > A CDC index is a log of row keys for each data table row mutations. These > index rows are ordered by mutation timestamp for a given row. This index is > used for capturing recent changes to a data table. It only stores the changes > for the max lookback period. It is only used for CDC queries. > For regular indexes we do index maintenance such that if a data table row is > deleted, we also delete the corresponding index row. This is especially > needed for covered indexes for correctness as we use the index alone to serve > the queries. > For uncovered indexes, this delete is not necessary for correctness but > needed for performance reason not to scan deleted rows again and again, and > not to attempt to scan the corresponding deleted data table rows. However, > none of these reasons are really applicable to CDC indexes. Since CDC index > table rows expires quickly we do not really need to delete them. It is also > expected that a CDC index row is scanned once. > For CDC indexes we add an extra delete markers for each deleted row to have > two delete markers, one with the embedded row timestamp value that is equal > to the delete operation timestamp and the other with the embedded row > timestamp value that is equal to the latest put operation timestamp of this > row. However, we need only the former one for including the delete operation > in the correct order. > For the same reasons, we do not need to read repair index rows for CDC > indexes either. The read repair is done currently to delete the orphan index > rows. These rows happens if index put succeed but the corresponding data put > does not. > Eliminating index maintenance will improve index performance and simplify its > code. -- This message was sent by Atlassian Jira (v8.20.10#820010)