[ 
https://issues.apache.org/jira/browse/KAFKA-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-20322.
---------------------------------
      Reviewer: David Jacot
    Resolution: Fixed

> TransactionMarkerChannelManager has discoverBrokerVersions=false causing 
> UnsupportedVersionException during rolling upgrades
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20322
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20322
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 4.2.0
>            Reporter: Ritika Reddy
>            Assignee: Ritika Reddy
>            Priority: Blocker
>             Fix For: 4.3.0, 4.2.1
>
>
> *BUG:* TransactionMarkerChannelManager does not discover broker API versions, 
> causing UnsupportedVersionException during rolling 
>  upgrades                                                                     
>                                                        
>                   
> [KIP-1228|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest]
>  added WriteTxnMarkersRequest v2 with a TransactionVersion field. However, 
> TransactionMarkerChannelManager creates its NetworkClient with 
> discoverBrokerVersions=false, which disables API version negotiation with 
> peer brokers. Without version discovery, the ApiVersions cache is never 
> populated — apiVersions.get(nodeId) returns null in NetworkClient.doSend(), 
> causing it to fall through to builder.latestAllowedVersion() which blindly 
> uses the highest version the sending broker knows about rather than 
> negotiating a mutually supported version. TransactionMarkerChannelManager is 
> the only inter-broker NetworkClient that sets discoverBrokerVersions=false; 
> all others (ReplicaFetcherThread, AlterPartitionManager, ForwardingManager, 
> etc.) correctly use true.
> *IMPACT:* When a 4.2+ broker is the transaction coordinator and needs to 
> write markers to a 4.1 or earlier broker (partition leader),
> it sends WriteTxnMarkersRequest v2, which the older broker doesn't support, 
> causing UnsupportedVersionException. Transaction markers are never written, 
> leaving transactions stuck in PrepareCommit/PrepareAbort. This prevents the 
> LSO (Last Stable Offset) from
> advancing, which blocks all read_committed consumers on affected partitions. 
> The issue is self-resolving once all brokers are
> upgraded to the same version, but transactions stuck during the mixed-version 
> window remain blocked until the coordinator retries
> successfully.
> *AFFECTED CODE:* 
> core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L98|http://example.com/]
> *SOLUTION:* Set discoverBrokerVersions to true in 
> TransactionMarkerChannelManager's NetworkClient constructor. This enables the 
> client to send ApiVersionsRequest when connecting to each broker, populate 
> the ApiVersions cache, and use latestUsableVersion() to
>   negotiate the correct request version. A one-line change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to