[
https://issues.apache.org/jira/browse/PHOENIX-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani updated PHOENIX-7653:
----------------------------------
Fix Version/s: 5.3.0
> New CDC Event for TTL expired rows
> ----------------------------------
>
> Key: PHOENIX-7653
> URL: https://issues.apache.org/jira/browse/PHOENIX-7653
> Project: Phoenix
> Issue Type: New Feature
> Reporter: Viraj Jasani
> Priority: Major
> Fix For: 5.3.0
>
>
> The purpose of this Jira is to extend the Change Data Capture (CDC)
> capabilities to generate CDC events when rows expire due to Time-To-Live
> (TTL) settings (literal or conditional) during the major compaction. The
> implementation ensures that applications consuming CDC streams receive
> notification when data is automatically removed from tables, providing
> additional visibility into the system-initiated deletions.
> The proposed new event_type: *ttl_delete*
> Example of TTL expired CDC event, assuming the row had two columns c1 and c2
> with values "v1" and "v2" respectively:
> {code:java}
> {
> "event_type": "ttl_delete",
> "pre_image": {
> "c1": "v1",
> "c2": "v2"
> },
> "post_image": {}
> } {code}
>
> *High level Design steps:*
> * Identify the event which causes the row expiration: conditional_ttl,
> maxlookback/ttl expired rows
> * Capture the complete row image for the expiration. The image needs to be
> directly inserted into the CDC index. If we do not provide the expired row
> pre-image upfront, CDC index can not scan it after the major compaction
> because the data table row no longer exists after it is expired by the major
> compaction. CompactionScanner needs to send the exact CDC Json structure with
> encoded bytes, which can later be directly sent to the client by the scanner
> when requested.
> * CDCGlobalIndexRegionScanner needs to check for the existence of the
> special CF:CQ, which if found, can be directly returned as the value of "CDC
> JSON" column.
> * For single CF, CompactionScanner needs to perform mutation to the CDC
> index directly only once.
> * For multi CF, CompactionScanner might perform multiple mutation to the CDC
> index. Therefore, it should use checkAndMutate to ensure the mutation happens
> if the row does not exist. If the row is already inserted, and the other CF
> compaction tries to put recent row values, it can update the existing
> pre-image.
> * In order to distinguish the same PHOENIX_ROW_TIMESTAMP() value for the CDC
> index while multiple CF compactions are taking place, CompactionScanner needs
> to provide compactionTime as the timestamp value in the CDC index rowkey by
> updating the rowkey before performing the mutation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)