[ 
https://issues.apache.org/jira/browse/KAFKA-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774580#comment-16774580
 ] 

Guozhang Wang commented on KAFKA-7934:
--------------------------------------

We can consider using the `Consumer#offsetsForTimes` API, which indeed leverage 
on the broker-side timestamp-index to return the offset. The guarantee is 
conservative, as stated in the javadoc:

```
Look up the offsets for the given partitions by timestamp. The returned offset 
for each partition is the earliest offset whose timestamp is greater than or 
equal to the given timestamp in the corresponding partition.
```

A side-effect of this call though, is that this is a blocking call, so we may 
need to consider batching it for all the changelogs of window stores.

> Optimize restore for windowed and session stores
> ------------------------------------------------
>
>                 Key: KAFKA-7934
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7934
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Major
>
> During state restore of window/session stores, the changelog topic is scanned 
> from the oldest entries to the newest entry. This happen on a 
> record-per-record basis or in record batches.
> During this process, new segments are created while time advances (base on 
> the record timestamp of the record that are restored). However, depending on 
> the retention time, we might expire segments during restore process later 
> again. This is wasteful. Because retention time is based on the largest 
> timestamp per partition, it is possible to compute a bound for live and 
> expired segment upfront (assuming that we know the largest timestamp). This 
> way, during restore, we could avoid creating segments that are expired later 
> anyway and skip over all corresponding records.
> The problem is, that we don't know the largest timestamp per partition. Maybe 
> the broker timestamp index could help to provide an approximation for this 
> value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to