mjsax commented on pull request #10953:
URL: https://github.com/apache/kafka/pull/10953#issuecomment-879596929


   > There could be data loss, because locally the windowed state store would 
store records for a longer period of time than in the changelog topic. If a 
Kafka Streams client is restarted with wiped out state it might restore less 
records into the state store than the state store had before the restart. In 
other words, if the Kafka Streams state store were not restarted, it would have 
more records than after the restart. This might happen because records that are 
within the larger retention time of the windowed store (i.e. the segments) 
might be outside the shorter retention time of the changelog topic, hence those 
records might have already been removed from the changelog topic before 
restoration starts.
   
   Yes, but the point is that if the _guaranteed_ retention time is T and T is 
applied to the changelog, the fact that T+X is applied to the state-store does 
not mean we _lose_ the data for this case, because we only guaranteed to hold 
data up to T anyway, and this guarantee is met. In the end, the changelog topic 
is the source of truth, not the state store.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to