[
https://issues.apache.org/jira/browse/KAFKA-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17861282#comment-17861282
]
A. Sophie Blee-Goldman commented on KAFKA-13499:
------------------------------------------------
Few quick clarification questions for you [~mjsax] :
# Is this ticket substantively different from KAFKA-7934 ?
# Why "especially for stream-stream joins"?
> Avoid restoring outdated records
> --------------------------------
>
> Key: KAFKA-13499
> URL: https://issues.apache.org/jira/browse/KAFKA-13499
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Matthias J. Sax
> Assignee: Danica Fine
> Priority: Major
>
> Kafka Streams has the config `windowstore.changelog.additional.retention.ms`
> to allow for an increase retention time.
> While an increase retention time can be useful, it can also lead to
> unnecessary restore cost, especially for stream-stream joins. Assume a
> stream-stream join with 1h window size and a grace period of 1h. For this
> case, we only need 2h of data to restore. If we lag, the
> `windowstore.changelog.additional.retention.ms` helps to prevent the broker
> from truncating data too early. However, if we don't lag and we need to
> restore, we restore everything from the changelog.
> Instead of doing a seek-to-beginning, we could use the timestamp index to
> seek the first offset older than the 2h "window" of data that we need to
> restore, to avoid unnecessary work.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)