Viraj Jasani created PHOENIX-7513:
-------------------------------------

             Summary: Clean-up CDC partition metadata for closed partitions
                 Key: PHOENIX-7513
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7513
             Project: Phoenix
          Issue Type: Sub-task
            Reporter: Viraj Jasani


Phoenix CDC Partitions can be categorized into two categories:
 # Open partitions: Any partition with corresponding data table region that is 
currently active is considered as open partition. The data table region can 
continue to server read/write requests until it is split into two daughter 
regions or multiple parent regions are merged into one region.
 # Closed partitions: Any partition with corresponding data table regions that 
is not longer alive and ready to be archived or already archived after getting 
split or merged into new region(s), is considered as closed partition. The data 
table region is no longer live and hence can no longer server any more 
read/write requests.

Once parent region(s) split or merged into child region(s), metadata for the 
closed partitions should stay in SYSTEM.CDC_STREAM at least for predetermined 
Stream metadata TTL time duration (let's say 24 hr by default). After this 
duration, the records should be cleaned up.

 

The cleanup can be performed in any of the two ways:

Wither, use background Task that can clean up partitions that have been closed 
i.e. the rows with not-null PARTITION_END_TIME and PHOENIX_ROW_TIMESTAMP() 
value less than current time - TTL (24 hr)

Or, use Conditional TTL with condition like:
{code:java}
TTL_EXPRESSION = CASE WHEN PHOENIX_ROW_TIMESTAMP() < (CURRENT_TIME() - 24 hr) 
AND PARTITION_END_TIME IS NOT NULL THEN 0 ELSE <FOREVER> END{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to