[
https://issues.apache.org/jira/browse/KAFKA-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang updated KAFKA-7934:
---------------------------------
Labels: new-streams-runtime-should-fix (was: )
> Optimize restore for windowed and session stores
> ------------------------------------------------
>
> Key: KAFKA-7934
> URL: https://issues.apache.org/jira/browse/KAFKA-7934
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Matthias J. Sax
> Priority: Major
> Labels: new-streams-runtime-should-fix
>
> During state restore of window/session stores, the changelog topic is scanned
> from the oldest entries to the newest entry. This happen on a
> record-per-record basis or in record batches.
> During this process, new segments are created while time advances (base on
> the record timestamp of the record that are restored). However, depending on
> the retention time, we might expire segments during restore process later
> again. This is wasteful. Because retention time is based on the largest
> timestamp per partition, it is possible to compute a bound for live and
> expired segment upfront (assuming that we know the largest timestamp). This
> way, during restore, we could avoid creating segments that are expired later
> anyway and skip over all corresponding records.
> The problem is, that we don't know the largest timestamp per partition. Maybe
> the broker timestamp index could help to provide an approximation for this
> value.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)