[
https://issues.apache.org/jira/browse/KAFKA-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ritika Reddy updated KAFKA-20322:
---------------------------------
Description:
*BUG:* TransactionMarkerChannelManager does not discover broker API versions,
causing UnsupportedVersionException during rolling
upgrades
[KIP-1228|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest]
added WriteTxnMarkersRequest v2 with a TransactionVersion field. However,
TransactionMarkerChannelManager creates its NetworkClient with
discoverBrokerVersions=false, which disables API version negotiation with peer
brokers. Without version discovery, the ApiVersions cache is never populated —
apiVersions.get(nodeId) returns null in NetworkClient.doSend(), causing it to
fall through to builder.latestAllowedVersion() which blindly uses the highest
version the sending broker knows about rather than negotiating a mutually
supported version. TransactionMarkerChannelManager is the only inter-broker
NetworkClient that sets discoverBrokerVersions=false; all others
(ReplicaFetcherThread, AlterPartitionManager, ForwardingManager, etc.)
correctly use true.
*IMPACT:* When a 4.2+ broker is the transaction coordinator and needs to write
markers to a 4.1 or earlier broker (partition leader),
it sends WriteTxnMarkersRequest v2, which the older broker doesn't support,
causing UnsupportedVersionException. Transaction markers are never written,
leaving transactions stuck in PrepareCommit/PrepareAbort. This prevents the LSO
(Last Stable Offset) from
advancing, which blocks all read_committed consumers on affected partitions.
The issue is self-resolving once all brokers are
upgraded to the same version, but transactions stuck during the mixed-version
window remain blocked until the coordinator retries
successfully.
*AFFECTED CODE:*
core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L98|http://example.com/]
*SOLUTION:* Set discoverBrokerVersions to true in
TransactionMarkerChannelManager's NetworkClient constructor. This enables the
client to send ApiVersionsRequest when connecting to each broker, populate the
ApiVersions cache, and use latestUsableVersion() to
negotiate the correct request version. A one-line change.
was:
Bug: TransactionMarkerChannelManager does not discover broker API versions,
causing UnsupportedVersionException during rolling
upgrades
[KIP-1228|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest]
(KAFKA-19446) added WriteTxnMarkersRequest v2 with a TransactionVersion field.
However, TransactionMarkerChannelManager creates its NetworkClient with
discoverBrokerVersions=false, which disables API version negotiation with peer
brokers. Without version discovery, the ApiVersions cache is never populated —
apiVersions.get(nodeId) returns null in NetworkClient.doSend(), causing it to
fall through to builder.latestAllowedVersion() which blindly uses the highest
version the sending broker knows about rather than negotiating a mutually
supported version. TransactionMarkerChannelManager is the only inter-broker
NetworkClient that sets discoverBrokerVersions=false; all others
(ReplicaFetcherThread, AlterPartitionManager, ForwardingManager, etc.)
correctly use true.
*IMPACT:* When a 4.2+ broker is the transaction coordinator and needs to write
markers to a 4.1 or earlier broker (partition leader),
it sends WriteTxnMarkersRequest v2, which the older broker doesn't support,
causing UnsupportedVersionException. Transaction markers are never written,
leaving transactions stuck in PrepareCommit/PrepareAbort. This prevents the LSO
(Last Stable Offset) from
advancing, which blocks all read_committed consumers on affected partitions.
The issue is self-resolving once all brokers are
upgraded to the same version, but transactions stuck during the mixed-version
window remain blocked until the coordinator retries
successfully.
*AFFECTED CODE:*
core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L98|http://example.com]
SOLUTION: Set discoverBrokerVersions to true in
TransactionMarkerChannelManager's NetworkClient constructor. This enables the
client
to send ApiVersionsRequest when connecting to each broker, populate the
ApiVersions cache, and use latestUsableVersion() to
negotiate the correct request version. A one-line change.
> TransactionMarkerChannelManager has discoverBrokerVersions=false causing
> UnsupportedVersionException during rolling upgrades
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-20322
> URL: https://issues.apache.org/jira/browse/KAFKA-20322
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 4.2.0
> Reporter: Ritika Reddy
> Assignee: Ritika Reddy
> Priority: Major
> Fix For: 4.2.1
>
>
> *BUG:* TransactionMarkerChannelManager does not discover broker API versions,
> causing UnsupportedVersionException during rolling
> upgrades
>
>
> [KIP-1228|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest]
> added WriteTxnMarkersRequest v2 with a TransactionVersion field. However,
> TransactionMarkerChannelManager creates its NetworkClient with
> discoverBrokerVersions=false, which disables API version negotiation with
> peer brokers. Without version discovery, the ApiVersions cache is never
> populated — apiVersions.get(nodeId) returns null in NetworkClient.doSend(),
> causing it to fall through to builder.latestAllowedVersion() which blindly
> uses the highest version the sending broker knows about rather than
> negotiating a mutually supported version. TransactionMarkerChannelManager is
> the only inter-broker NetworkClient that sets discoverBrokerVersions=false;
> all others (ReplicaFetcherThread, AlterPartitionManager, ForwardingManager,
> etc.) correctly use true.
> *IMPACT:* When a 4.2+ broker is the transaction coordinator and needs to
> write markers to a 4.1 or earlier broker (partition leader),
> it sends WriteTxnMarkersRequest v2, which the older broker doesn't support,
> causing UnsupportedVersionException. Transaction markers are never written,
> leaving transactions stuck in PrepareCommit/PrepareAbort. This prevents the
> LSO (Last Stable Offset) from
> advancing, which blocks all read_committed consumers on affected partitions.
> The issue is self-resolving once all brokers are
> upgraded to the same version, but transactions stuck during the mixed-version
> window remain blocked until the coordinator retries
> successfully.
> *AFFECTED CODE:*
> core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L98|http://example.com/]
> *SOLUTION:* Set discoverBrokerVersions to true in
> TransactionMarkerChannelManager's NetworkClient constructor. This enables the
> client to send ApiVersionsRequest when connecting to each broker, populate
> the ApiVersions cache, and use latestUsableVersion() to
> negotiate the correct request version. A one-line change.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)