[ 
https://issues.apache.org/jira/browse/KAFKA-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772534#comment-16772534
 ] 

Sophie Blee-Goldman commented on KAFKA-7934:
--------------------------------------------

Is there some reason we can't just work backwards from the last message using a 
putIfAbsent method (would need to be implemented for these stores, I believe..) 
That would definitely minimize the number of expired records we insert then 
delete

> Optimize restore for windowed and session stores
> ------------------------------------------------
>
>                 Key: KAFKA-7934
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7934
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Major
>
> During state restore of window/session stores, the changelog topic is scanned 
> from the oldest entries to the newest entry. This happen on a 
> record-per-record basis or in record batches.
> During this process, new segments are created while time advances (base on 
> the record timestamp of the record that are restored). However, depending on 
> the retention time, we might expire segments during restore process later 
> again. This is wasteful. Because retention time is based on the largest 
> timestamp per partition, it is possible to compute a bound for live and 
> expired segment upfront (assuming that we know the largest timestamp). This 
> way, during restore, we could avoid creating segments that are expired later 
> anyway and skip over all corresponding records.
> The problem is, that we don't know the largest timestamp per partition. Maybe 
> the broker timestamp index could help to provide an approximation for this 
> value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to