[jira] [Commented] (KAFKA-9199) Improve handling of out of sequence errors lower than last acked sequence

Jira Mon, 12 May 2025 12:44:34 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-9199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951044#comment-17951044
 ]


José Armando García Sancio commented on KAFKA-9199:
---------------------------------------------------

When fixing the issue please revert this commit to trunk: 
[13fa453|https://github.com/apache/kafka/commit/13fa4537f53f2524ccf1fd7e79d4d4184e093cc1]

> Improve handling of out of sequence errors lower than last acked sequence
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-9199
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9199
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, producer 
>            Reporter: Jason Gustafson
>            Priority: Major
>
> The broker attempts to cache the state of the last 5 batches in order to 
> enable duplicate detection. This caching is not guaranteed across restarts: 
> we only write the state of the last batch to the snapshot file. It is 
> possible in some cases for this to result in a sequence such as the following:
>  # Send sequence=n
>  # Sequence=n successfully written, but response is not received
>  # Leader changes after broker restart
>  # Send sequence=n+1
>  # Receive successful response for n+1
>  # Sequence=n times out and is retried, results in out of order sequence
> There are a couple problems here. First, it would probably be better for the 
> broker to return DUPLICATE_SEQUENCE_NUMBER when a sequence number is received 
> which is lower than any of the cached batches. Second, the producer handles 
> this situation by just retrying until expiration of the delivery timeout. 
> Instead it should just fail the batch. 
> This issue popped up in the reassignment system test. It ultimately caused 
> the test to fail because the producer was stuck retrying the duplicate batch 
> repeatedly until ultimately giving up.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-9199) Improve handling of out of sequence errors lower than last acked sequence

Reply via email to