Hi All, We are exploring CDC ingestion from DBs to Iceberg, certain DB like TiDB have row level TTL enabled, there seem to be following ways to handle ttl- 1. Emit an explicit CDC event for the row as soon as the TTL window expires 2. Handle TTL at reader layer by filtering expired rows in Iceberg reader 3. or delegate responsibility of filtering out expired rows to the end user (periodically clean up expired rows from Iceberg snapshot) 4. or provide a view to the user with a filter added to remove expired rows
It Seems like DBs like TiDB clean up the TTL periodically via a GC process and emit corresponding CDC events after the GC and not immediately after the TTL expiry... so it may happen that the CDC event is not emitted in the following case- 1. current time: 10, key: Key1, value : value1, TTL-100 2. current time: 110, key: Key1, value : value2, TTL-200 -- TTL updated to 200, value also updated 3. current time : 150 GC process to cleanup TTL runs at time and emits CDC for expired rows Now from time 100 -> 110, the record should not be visible as it had expired but since no CDC event was emitted so Iceberg will show record as live between time 100 -> 110 as well... I wanted to know, how is it handled by other folks or is there any recommendation for handling TTL records in Iceberg CDC ingestion? regards, Aditya