[jira] [Updated] (KAFKA-17637) Invert the search for LIST_OFFSETS request for remote storage topic

Kamal Chandraprakash (Jira) Fri, 27 Sep 2024 15:37:50 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-17637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kamal Chandraprakash updated KAFKA-17637:
-----------------------------------------
    Description: 
The timestamp in the records are non-monotonic so we begin the search from 
earliest to latest offset for LIST_OFFSETS request.

When tiered storage is enabled for a topic, then we begin the search from 
remote to local storage. There can be possible concurrency issue that can 
happen, when the search moves from remote to local storage, some of the 
local-log segments might get uploaded to remote and deleted from local in the 
meantime. This can lead to loss of precision in returning the offset for the 
given timestamp. If this issue happens, then we might silently search for the 
timestamp in the next/available local-log segment. 

One way to fix this issue, is-to trigger the search in local-log first, then 
move to remote-log, and compare the result. The similar approach for 
MAX_TIMESTAMP is explained in:

[https://github.com/apache/kafka/pull/16602#discussion_r1759757001]

1. Search in local-log and find the result. 
2. Search in remote-log and find the result. 
3. Compare both the results to pickup the correct offset.

  was:
The timestamp in the records are non-monotonic so we begin the search from 
earliest to latest offset for LIST_OFFSETS request.

When tiered storage is enabled for a topic, then we begin the search from 
remote to local storage. There can be possible concurrency issue that can 
happen, when the search moves from remote to local storage, some of the 
local-log segments might get uploaded to remote and deleted from local in the 
meantime. This can lead to loss of precision in returning the offset for the 
given timestamp. If this issue happens, then we might silently search for the 
timestamp in the next/available local-log segment. 

One way to fix this issue, is-to trigger the search in local-log first, then 
move to remote-log, and compare the result. The approach is explained in:

https://github.com/apache/kafka/pull/16602#discussion_r1759757001


> Invert the search for LIST_OFFSETS request for remote storage topic
> -------------------------------------------------------------------
>
>                 Key: KAFKA-17637
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17637
>             Project: Kafka
>          Issue Type: Task
>            Reporter: Kamal Chandraprakash
>            Priority: Major
>
> The timestamp in the records are non-monotonic so we begin the search from 
> earliest to latest offset for LIST_OFFSETS request.
> When tiered storage is enabled for a topic, then we begin the search from 
> remote to local storage. There can be possible concurrency issue that can 
> happen, when the search moves from remote to local storage, some of the 
> local-log segments might get uploaded to remote and deleted from local in the 
> meantime. This can lead to loss of precision in returning the offset for the 
> given timestamp. If this issue happens, then we might silently search for the 
> timestamp in the next/available local-log segment. 
> One way to fix this issue, is-to trigger the search in local-log first, then 
> move to remote-log, and compare the result. The similar approach for 
> MAX_TIMESTAMP is explained in:
> [https://github.com/apache/kafka/pull/16602#discussion_r1759757001]
> 1. Search in local-log and find the result. 
> 2. Search in remote-log and find the result. 
> 3. Compare both the results to pickup the correct offset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-17637) Invert the search for LIST_OFFSETS request for remote storage topic

Reply via email to