David Arthur created KAFKA-15552:
------------------------------------
Summary: Duplicate Producer ID blocks during ZK migration
Key: KAFKA-15552
URL: https://issues.apache.org/jira/browse/KAFKA-15552
Project: Kafka
Issue Type: Bug
Affects Versions: 3.5.1, 3.4.1, 3.5.0, 3.4.0, 3.6.0
Reporter: David Arthur
Assignee: David Arthur
Fix For: 3.4.2, 3.5.2, 3.6.1
When migrating producer ID blocks from ZK to KRaft, we are taking the current
producer ID block from ZK and writing it's "firstProducerId" into the producer
IDs KRaft record. However, in KRaft we store the _next_ producer ID block in
the log rather than storing the current block like ZK does. The end result is
that the first block given to a caller of AllocateProducerIds is a duplicate of
the last block allocated in ZK mode.
This can result in duplicate producer IDs being given to transactional or
idempotent producers. In the case of transactional producers, this can cause
long term problems since the producer IDs are persisted and reused for a long
time.
The time between the last producer ID block being allocated by the ZK
controller and all the brokers being restarted following the metadata migration
is when this bug is possible.
Symptoms of this bug will include ReplicaManager OutOfOrderSequenceException
and possibly some producer epoch validation errors. To see if a cluster is
affected by this bug, search for the offending producer ID and see if it is
being used by more than one producer.
For example, the following error was observed
{code}
Out of order sequence number for producer 376000 at offset 381338 in partition
REDACTED: 0 (incoming seq. number), 21 (current end sequence number)
{code}
Then searching for "376000" on
org.apache.kafka.clients.producer.internals.TransactionManager logs, two
brokers both show the same producer ID being provisioned
{code}
Broker 0 [Producer clientId=REDACTED-0] ProducerId set to 376000 with epoch 1
Broker 5 [Producer clientId=REDACTED-1] ProducerId set to 376000 with epoch 1
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)