Jason Gustafson created KAFKA-9199:
--------------------------------------

             Summary: Improve handling of out of sequence errors lower than 
last acked sequence
                 Key: KAFKA-9199
                 URL: https://issues.apache.org/jira/browse/KAFKA-9199
             Project: Kafka
          Issue Type: Bug
          Components: producer 
            Reporter: Jason Gustafson


The broker attempts to cache the state of the last 5 batches in order to enable 
duplicate detection. This caching is not guaranteed across restarts: we only 
write the state of the last batch to the snapshot file. It is possible in some 
cases for this to result in a sequence such as the following:
 # Send sequence=n
 # Sequence=n successfully written, but response is not received
 # Leader changes after broker restart
 # Send sequence=n+1
 # Receive successful response for n+1
 # Sequence=n times out and is retried, results in out of order sequence

There are a couple problems here. First, it would probably be better for the 
broker to return DUPLICATE_SEQUENCE_NUMBER when a sequence number is received 
which is lower than any of the cached batches. Second, the producer handles 
this situation by just retrying until expiration of the delivery timeout. 
Instead it should just fail the batch. 

This issue popped up in the reassignment system test. It ultimately caused the 
test to fail because the producer was stuck retrying the duplicate batch 
repeatedly until ultimately giving up.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to