virajjasani opened a new pull request, #2209:
URL: https://github.com/apache/phoenix/pull/2209

   Jira: PHOENIX-7653
   
   The purpose of this PR is to extend the Change Data Capture (CDC) 
capabilities to generate CDC events when rows expire due to Time-To-Live (TTL) 
settings (literal or conditional) during the major compaction. The 
implementation ensures that applications consuming CDC streams receive 
notification when data is automatically removed from tables, providing 
additional visibility into the system-initiated deletions.
   
   The proposed new event_type: ttl_delete
   
   Example of TTL expired CDC event, assuming the row had two columns c1 and c2 
with values "v1" and "v2" respectively:
   ```
   {
     "event_type": "ttl_delete",
     "pre_image": {
       "c1": "v1",
       "c2": "v2"
     },
     "post_image": {}
   }
   ```
   
   High level Design steps:
   
   - Identify the event which causes the row expiration: conditional_ttl, 
maxlookback/ttl expired rows
   - Capture the complete row image for the expiration. The image needs to be 
directly inserted into the CDC index. If we do not provide the expired row 
pre-image upfront, CDC index can not scan it after the major compaction because 
the data table row no longer exists after it is expired by the major 
compaction. CompactionScanner needs to send the exact CDC Json structure with 
encoded bytes, which can later be directly sent to the client by the scanner 
when requested.
   - CDCGlobalIndexRegionScanner needs to check for the existence of the 
special CF:CQ, which if found, can be directly returned as the value of "CDC 
JSON" column.
   - For single CF, CompactionScanner needs to perform mutation to the CDC 
index directly only once.
   - For multi CF, CompactionScanner might perform multiple mutation to the CDC 
index. Therefore, it should use checkAndMutate to ensure the mutation happens 
if the row does not exist. If the row is already inserted, and the other CF 
compaction tries to put recent row values, it can update the existing pre-image.
   - In order to distinguish the same PHOENIX_ROW_TIMESTAMP() value for the CDC 
index while multiple CF compactions are taking place, CompactionScanner needs 
to provide compactionTime as the timestamp value in the CDC index rowkey by 
updating the rowkey before performing the mutation.
   - Introduce some retries in case of HTable mutation failures.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to