[
https://issues.apache.org/jira/browse/KAFKA-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Jacot resolved KAFKA-20322.
---------------------------------
Reviewer: David Jacot
Resolution: Fixed
> TransactionMarkerChannelManager has discoverBrokerVersions=false causing
> UnsupportedVersionException during rolling upgrades
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-20322
> URL: https://issues.apache.org/jira/browse/KAFKA-20322
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 4.2.0
> Reporter: Ritika Reddy
> Assignee: Ritika Reddy
> Priority: Blocker
> Fix For: 4.3.0, 4.2.1
>
>
> *BUG:* TransactionMarkerChannelManager does not discover broker API versions,
> causing UnsupportedVersionException during rolling
> upgrades
>
>
> [KIP-1228|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest]
> added WriteTxnMarkersRequest v2 with a TransactionVersion field. However,
> TransactionMarkerChannelManager creates its NetworkClient with
> discoverBrokerVersions=false, which disables API version negotiation with
> peer brokers. Without version discovery, the ApiVersions cache is never
> populated — apiVersions.get(nodeId) returns null in NetworkClient.doSend(),
> causing it to fall through to builder.latestAllowedVersion() which blindly
> uses the highest version the sending broker knows about rather than
> negotiating a mutually supported version. TransactionMarkerChannelManager is
> the only inter-broker NetworkClient that sets discoverBrokerVersions=false;
> all others (ReplicaFetcherThread, AlterPartitionManager, ForwardingManager,
> etc.) correctly use true.
> *IMPACT:* When a 4.2+ broker is the transaction coordinator and needs to
> write markers to a 4.1 or earlier broker (partition leader),
> it sends WriteTxnMarkersRequest v2, which the older broker doesn't support,
> causing UnsupportedVersionException. Transaction markers are never written,
> leaving transactions stuck in PrepareCommit/PrepareAbort. This prevents the
> LSO (Last Stable Offset) from
> advancing, which blocks all read_committed consumers on affected partitions.
> The issue is self-resolving once all brokers are
> upgraded to the same version, but transactions stuck during the mixed-version
> window remain blocked until the coordinator retries
> successfully.
> *AFFECTED CODE:*
> core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L98|http://example.com/]
> *SOLUTION:* Set discoverBrokerVersions to true in
> TransactionMarkerChannelManager's NetworkClient constructor. This enables the
> client to send ApiVersionsRequest when connecting to each broker, populate
> the ApiVersions cache, and use latestUsableVersion() to
> negotiate the correct request version. A one-line change.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)