mjsax commented on pull request #10953: URL: https://github.com/apache/kafka/pull/10953#issuecomment-879596929
> There could be data loss, because locally the windowed state store would store records for a longer period of time than in the changelog topic. If a Kafka Streams client is restarted with wiped out state it might restore less records into the state store than the state store had before the restart. In other words, if the Kafka Streams state store were not restarted, it would have more records than after the restart. This might happen because records that are within the larger retention time of the windowed store (i.e. the segments) might be outside the shorter retention time of the changelog topic, hence those records might have already been removed from the changelog topic before restoration starts. Yes, but the point is that if the _guaranteed_ retention time is T and T is applied to the changelog, the fact that T+X is applied to the state-store does not mean we _lose_ the data for this case, because we only guaranteed to hold data up to T anyway, and this guarantee is met. In the end, the changelog topic is the source of truth, not the state store. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org