[
https://issues.apache.org/jira/browse/KAFKA-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863639#comment-17863639
]
Matthias J. Sax commented on KAFKA-13499:
-----------------------------------------
It's been a while since I filed these tickets... Not even sure if I did look at
(ie remember KAFKA-7934) when I filing this ticket. – So not sure if they are
_substantively_ different or are the same.
The main difference addressing your second question is, that stream-stream join
state stores are not exposed via IQ, and thus we can be more aggressive and
restore less data compared to windowed and sessions stores for which we need to
restore a longer history to make the data available for IQ queries.
> Avoid restoring outdated records
> --------------------------------
>
> Key: KAFKA-13499
> URL: https://issues.apache.org/jira/browse/KAFKA-13499
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Matthias J. Sax
> Assignee: Danica Fine
> Priority: Major
>
> Kafka Streams has the config `windowstore.changelog.additional.retention.ms`
> to allow for an increase retention time.
> While an increase retention time can be useful, it can also lead to
> unnecessary restore cost, especially for stream-stream joins. Assume a
> stream-stream join with 1h window size and a grace period of 1h. For this
> case, we only need 2h of data to restore. If we lag, the
> `windowstore.changelog.additional.retention.ms` helps to prevent the broker
> from truncating data too early. However, if we don't lag and we need to
> restore, we restore everything from the changelog.
> Instead of doing a seek-to-beginning, we could use the timestamp index to
> seek the first offset older than the 2h "window" of data that we need to
> restore, to avoid unnecessary work.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)