Cameron Lee created SAMZA-1642:

             Summary: Retention time for Kafka changelog topic does not get 
updated when TTL for high-level API join gets changed
                 Key: SAMZA-1642
             Project: Samza
          Issue Type: Bug
          Components: kafka
            Reporter: Cameron Lee


When using the high-level API to join streams, Samza automatically sets up a 
couple of RocksDB stores in order to keep track of each side of the join. The 
retention time of the RocksDB stores is set to the join TTL. These RocksDB 
stores are backed up by Kafka changelogs. Samza will automatically create these 
changelogs in Kafka, and the retention time of the changelogs is set to the 
join TTL as well.


If the Samza job is initially deployed with a certain join TTL, then the Kafka 
changelogs will be created with the retention time set to that initial join 
TTL. If the Samza job is then redeployed with a different join TTL, then the 
retention time for the Kafka changelog will not get updated to the new value. 
However, the RocksDB TTL will get updated. This means that there will be an 
inconsistency between the RocksDB TTL and the Kafka changelog retention time. 
This will cause an issue when the Kafka changelog is needed to bootstrap a 
container, because the Kafka changelog will not properly reflect the data that 
existed in the corresponding RocksDB store on the previous container.

*Potential resources for solution:*

Kafka has an AdminUtils class 
 to fetch and change topic configurations (although these seem to currently be 
deprecated and replaced by AdminClient. Kafka 0.11 has an AdminClient 
 which allows for describing and altering configs for topics. One potential 
solution is that on startup, the Samza job could check the retention time 
config of the changelog topic and update it if necessary.

This message was sent by Atlassian JIRA

Reply via email to