[ 
https://issues.apache.org/jira/browse/KAFKA-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771164#comment-16771164
 ] 

John Roesler commented on KAFKA-7934:
-------------------------------------

Thanks [~mjsax], this is a really good idea.

 

Just adding a few thoughts,

We do bound the inefficiency of the restore operation for window storesby 
setting the retention time on the changelog equal to the retention time of the 
store, but the broker isn't necessarily very prompt in cleaning up expired 
records. This ticket would close the gap by skipping records that are expired, 
but haven't been cleaned up by the broker yet.

Come to think of it, is there a problem with having the broker clean up 
changelog records based on wall clock time, since the expiration semantics are 
based on stream time?

 

Adding another idea to the broker timestamp index, a client-side option we 
could consider is just to read the last message in the topic during 
initialization to get an approximation of the latest stream time, before 
proceeding to scan the changelog from earliest as normal. Of course, since 
timestamps aren't non-decreasing, the latest record isn't guaranteed to have 
the highest timestamp, but presumably it's a better starting point than the 
timestamp of the earliest record.

Thanks,

-John

> Optimize restore for windowed and session stores
> ------------------------------------------------
>
>                 Key: KAFKA-7934
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7934
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Major
>
> During state restore of window/session stores, the changelog topic is scanned 
> from the oldest entries to the newest entry. This happen on a 
> record-per-record basis or in record batches.
> During this process, new segments are created while time advances (base on 
> the record timestamp of the record that are restored). However, depending on 
> the retention time, we might expire segments during restore process later 
> again. This is wasteful. Because retention time is based on the largest 
> timestamp per partition, it is possible to compute a bound for live and 
> expired segment upfront (assuming that we know the largest timestamp). This 
> way, during restore, we could avoid creating segments that are expired later 
> anyway and skip over all corresponding records.
> The problem is, that we don't know the largest timestamp per partition. Maybe 
> the broker timestamp index could help to provide an approximation for this 
> value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to