[ https://issues.apache.org/jira/browse/KAFKA-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772534#comment-16772534 ]
Sophie Blee-Goldman commented on KAFKA-7934: -------------------------------------------- Is there some reason we can't just work backwards from the last message using a putIfAbsent method (would need to be implemented for these stores, I believe..) That would definitely minimize the number of expired records we insert then delete > Optimize restore for windowed and session stores > ------------------------------------------------ > > Key: KAFKA-7934 > URL: https://issues.apache.org/jira/browse/KAFKA-7934 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Matthias J. Sax > Priority: Major > > During state restore of window/session stores, the changelog topic is scanned > from the oldest entries to the newest entry. This happen on a > record-per-record basis or in record batches. > During this process, new segments are created while time advances (base on > the record timestamp of the record that are restored). However, depending on > the retention time, we might expire segments during restore process later > again. This is wasteful. Because retention time is based on the largest > timestamp per partition, it is possible to compute a bound for live and > expired segment upfront (assuming that we know the largest timestamp). This > way, during restore, we could avoid creating segments that are expired later > anyway and skip over all corresponding records. > The problem is, that we don't know the largest timestamp per partition. Maybe > the broker timestamp index could help to provide an approximation for this > value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)