[ https://issues.apache.org/jira/browse/KAFKA-9199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951044#comment-17951044 ]
José Armando García Sancio commented on KAFKA-9199: --------------------------------------------------- When fixing the issue please revert this commit to trunk: [13fa453|https://github.com/apache/kafka/commit/13fa4537f53f2524ccf1fd7e79d4d4184e093cc1] > Improve handling of out of sequence errors lower than last acked sequence > ------------------------------------------------------------------------- > > Key: KAFKA-9199 > URL: https://issues.apache.org/jira/browse/KAFKA-9199 > Project: Kafka > Issue Type: Bug > Components: clients, producer > Reporter: Jason Gustafson > Priority: Major > > The broker attempts to cache the state of the last 5 batches in order to > enable duplicate detection. This caching is not guaranteed across restarts: > we only write the state of the last batch to the snapshot file. It is > possible in some cases for this to result in a sequence such as the following: > # Send sequence=n > # Sequence=n successfully written, but response is not received > # Leader changes after broker restart > # Send sequence=n+1 > # Receive successful response for n+1 > # Sequence=n times out and is retried, results in out of order sequence > There are a couple problems here. First, it would probably be better for the > broker to return DUPLICATE_SEQUENCE_NUMBER when a sequence number is received > which is lower than any of the cached batches. Second, the producer handles > this situation by just retrying until expiration of the delivery timeout. > Instead it should just fail the batch. > This issue popped up in the reassignment system test. It ultimately caused > the test to fail because the producer was stuck retrying the duplicate batch > repeatedly until ultimately giving up. > -- This message was sent by Atlassian Jira (v8.20.10#820010)