[ https://issues.apache.org/jira/browse/KAFKA-15552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790511#comment-17790511 ]
ASF GitHub Bot commented on KAFKA-15552: ---------------------------------------- mimaison commented on code in PR #560: URL: https://github.com/apache/kafka-site/pull/560#discussion_r1407624587 ########## 36/upgrade.html: ########## @@ -84,7 +84,9 @@ <h5><a id="upgrade_360_kraft" href="#upgrade_360_kraft">Upgrading KRaft-based cl <h5><a id="upgrade_360_notable" href="#upgrade_360_notable">Notable changes in 3.6.0</a></h5> <ul> - <li>Apache Kafka now supports having both an IPv4 and an IPv6 listener on the same port. This change only applies to + <li>ZooKeeper to KRaft migrations are now recommended for production usage. One significant issue was found in the + 3.6.0 release which affects transactional producers https://issues.apache.org/jira/browse/KAFKA-15552.</li> Review Comment: This information is very important to communicate to our users. Today users that don't follow closely the mailing list or Jira (which is most users), all they know is that migration is now production ready and unfortunately they are likely to run into issues if they run it. Can we rephrase slightly the sentence to address the comments from Justine and Ismael and push that to the website soon? > Duplicate Producer ID blocks during ZK migration > ------------------------------------------------ > > Key: KAFKA-15552 > URL: https://issues.apache.org/jira/browse/KAFKA-15552 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.4.0, 3.5.0, 3.4.1, 3.6.0, 3.5.1 > Reporter: David Arthur > Assignee: David Arthur > Priority: Critical > Fix For: 3.5.2, 3.6.1 > > > When migrating producer ID blocks from ZK to KRaft, we are taking the current > producer ID block from ZK and writing it's "firstProducerId" into the > producer IDs KRaft record. However, in KRaft we store the _next_ producer ID > block in the log rather than storing the current block like ZK does. The end > result is that the first block given to a caller of AllocateProducerIds is a > duplicate of the last block allocated in ZK mode. > > This can result in duplicate producer IDs being given to transactional or > idempotent producers. In the case of transactional producers, this can cause > long term problems since the producer IDs are persisted and reused for a long > time. > The time between the last producer ID block being allocated by the ZK > controller and all the brokers being restarted following the metadata > migration is when this bug is possible. > > Symptoms of this bug will include ReplicaManager OutOfOrderSequenceException > and possibly some producer epoch validation errors. To see if a cluster is > affected by this bug, search for the offending producer ID and see if it is > being used by more than one producer. > > For example, the following error was observed > {code} > Out of order sequence number for producer 376000 at offset 381338 in > partition REDACTED: 0 (incoming seq. number), 21 (current end sequence > number) > {code} > Then searching for "376000" on > org.apache.kafka.clients.producer.internals.TransactionManager logs, two > brokers both show the same producer ID being provisioned > {code} > Broker 0 [Producer clientId=REDACTED-0] ProducerId set to 376000 with epoch 1 > Broker 5 [Producer clientId=REDACTED-1] ProducerId set to 376000 with epoch 1 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)