[
https://issues.apache.org/jira/browse/KAFKA-14034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denis Rystsov updated KAFKA-14034:
----------------------------------
Summary: Consistency violation: enabled idempotency doesn't prevent
duplicates when a client runs into UNKNOWN_SERVER_ERROR (was: Consistency
violation: enabled idempotency doesn't prevent dublicates when a client runs
into UNKNOWN_SERVER_ERROR)
> Consistency violation: enabled idempotency doesn't prevent duplicates when a
> client runs into UNKNOWN_SERVER_ERROR
> ------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-14034
> URL: https://issues.apache.org/jira/browse/KAFKA-14034
> Project: Kafka
> Issue Type: Bug
> Components: clients
> Affects Versions: 3.0.0, 3.1.1
> Reporter: Denis Rystsov
> Priority: Major
>
> Hey folks, I've observed duplicated records in the log while idempotency was
> enabled and it looks like the kafka client is the culprit. I've tested on
> 3.0.0 but the tip of the kafka repo is also affected
> Let a user sends two produce requests without async so there is two inflight
> requests
> {code:java}
> producer.send(A)
> producer.send(B){code}
> Let the first request results with a retry-able error after it was written to
> disk and let the second request results with UNKNOWN_SERVER_ERROR. Any
> unhandled exception on the broker side results in UNKNOWN_SERVER_ERROR so it
> may happen.
> Since request A is retry-able it is put into the outbound queue there -
> [https://github.com/apache/kafka/blob/1daa149730e3c56b7b6fe8f14369178273e8efc4/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L623]
> Let B's UNKNOWN_SERVER_ERROR is received before A is retried. It is being
> processed in the following methods:
> *
> [https://github.com/apache/kafka/blob/1daa149730e3c56b7b6fe8f14369178273e8efc4/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L642]
> *
> [https://github.com/apache/kafka/blob/1daa149730e3c56b7b6fe8f14369178273e8efc4/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L742]
> *
> [https://github.com/apache/kafka/blob/1daa149730e3c56b7b6fe8f14369178273e8efc4/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L761]
> *
> [https://github.com/apache/kafka/blob/1daa149730e3c56b7b6fe8f14369178273e8efc4/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L624]
> * maybeTransitionToErrorState doesn't consider UNKNOWN_SERVER_ERROR fatal so
> it doesn't mark the request as such:
> [https://github.com/apache/kafka/blob/1daa149730e3c56b7b6fe8f14369178273e8efc4/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L611]
> * as result handleFailedBatch requests epoch bump
> [https://github.com/apache/kafka/blob/1daa149730e3c56b7b6fe8f14369178273e8efc4/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L652]
> When epoch is bumped it rewrites sequence numbers of the inflight requests:
> [https://github.com/apache/kafka/blob/1daa149730e3c56b7b6fe8f14369178273e8efc4/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L481]
> In our case it rewrites A's sequence numbers and when the request is retried
> the broker can't dedupe it and writes it to the log thus violating the
> idempotency guarantees.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)