[
https://issues.apache.org/jira/browse/KAFKA-15552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772375#comment-17772375
]
Justine Olshan commented on KAFKA-15552:
----------------------------------------
Recovering when this happens on idempotent producers isn't too bad. We can
restart the producer that had the duplicate producer ID.
As for the transactional case, I would need to think a bit on how to recover
that. We would need to force a new producer id via a tombstone for the old or
some other mechanism. (Think like epoch overflow on startup)
> Duplicate Producer ID blocks during ZK migration
> ------------------------------------------------
>
> Key: KAFKA-15552
> URL: https://issues.apache.org/jira/browse/KAFKA-15552
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 3.4.0, 3.5.0, 3.4.1, 3.6.0, 3.5.1
> Reporter: David Arthur
> Assignee: David Arthur
> Priority: Critical
> Fix For: 3.4.2, 3.5.2, 3.6.1
>
>
> When migrating producer ID blocks from ZK to KRaft, we are taking the current
> producer ID block from ZK and writing it's "firstProducerId" into the
> producer IDs KRaft record. However, in KRaft we store the _next_ producer ID
> block in the log rather than storing the current block like ZK does. The end
> result is that the first block given to a caller of AllocateProducerIds is a
> duplicate of the last block allocated in ZK mode.
>
> This can result in duplicate producer IDs being given to transactional or
> idempotent producers. In the case of transactional producers, this can cause
> long term problems since the producer IDs are persisted and reused for a long
> time.
> The time between the last producer ID block being allocated by the ZK
> controller and all the brokers being restarted following the metadata
> migration is when this bug is possible.
>
> Symptoms of this bug will include ReplicaManager OutOfOrderSequenceException
> and possibly some producer epoch validation errors. To see if a cluster is
> affected by this bug, search for the offending producer ID and see if it is
> being used by more than one producer.
>
> For example, the following error was observed
> {code}
> Out of order sequence number for producer 376000 at offset 381338 in
> partition REDACTED: 0 (incoming seq. number), 21 (current end sequence
> number)
> {code}
> Then searching for "376000" on
> org.apache.kafka.clients.producer.internals.TransactionManager logs, two
> brokers both show the same producer ID being provisioned
> {code}
> Broker 0 [Producer clientId=REDACTED-0] ProducerId set to 376000 with epoch 1
> Broker 5 [Producer clientId=REDACTED-1] ProducerId set to 376000 with epoch 1
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)