Viraj Jasani created PHOENIX-7653:
-------------------------------------
Summary: New CDC Event for TTL expired rows
Key: PHOENIX-7653
URL: https://issues.apache.org/jira/browse/PHOENIX-7653
Project: Phoenix
Issue Type: New Feature
Reporter: Viraj Jasani
The purpose of this Jira is to extend the Change Data Capture (CDC)
capabilities to generate CDC events when rows expire due to Time-To-Live (TTL)
settings (literal or conditional) during the major compaction. The
implementation ensures that applications consuming CDC streams receive
notification when data is automatically removed from tables, providing
additional visibility into the system-initiated deletions.
The proposed new event_type: *ttl_delete*
Example of TTL expired CDC event, assuming the row had two columns c1 and c2
with values "v1" and "v2" respectively:
{code:java}
{
"event_type": "ttl_delete",
"pre_image": {
"c1": "v1",
"c2": "v2"
},
"post_image": {}
} {code}
*High level Design steps:*
* Identify the event which causes the row expiration: conditional_ttl,
maxlookback/ttl expired rows
* Capture the complete row image for the expiration. The image needs to be
directly inserted into the CDC index. If we do not provide the expired row
pre-image upfront, CDC index can not scan it after the major compaction because
the data table row no longer exists after it is expired by the major
compaction. CompactionScanner needs to send the exact CDC Json structure with
encoded bytes, which can later be directly sent to the client by the scanner
when requested.
*
CDCGlobalIndexRegionScanner needs to check for the existence of the special
CF:CQ, which if found, can be directly returned as the value of "CDC JSON"
column.
* For single CF, CompactionScanner needs to perform mutation to the CDC index
directly only once.
* For multi CF, CompactionScanner might perform multiple mutation to the CDC
index. Therefore, it should use checkAndMutate to ensure the mutation happens
if the row does not exist. If the row is already inserted, and the other CF
compaction tries to put recent row values, it can update the existing pre-image.
* In order to distinguish the same PHOENIX_ROW_TIMESTAMP() value for the CDC
index while multiple CF compactions are taking place, CompactionScanner needs
to provide compactionTime as the timestamp value in the CDC index rowkey by
updating the rowkey before performing the mutation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)