[ 
https://issues.apache.org/jira/browse/KAFKA-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ritika Reddy updated KAFKA-20322:
---------------------------------
    Description: 
 Bug: TransactionMarkerChannelManager does not discover broker API versions, 
causing UnsupportedVersionException during rolling 
  upgrades                                                                      
                                                      
                  
 
[KIP-1228|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest]
 (KAFKA-19446) added WriteTxnMarkersRequest v2 with a TransactionVersion field. 
However, TransactionMarkerChannelManager creates its NetworkClient with 
discoverBrokerVersions=false, which disables API version negotiation with peer 
brokers. Without version discovery, the ApiVersions cache is never populated — 
apiVersions.get(nodeId) returns null in NetworkClient.doSend(), causing it to 
fall through to builder.latestAllowedVersion() which blindly uses the highest 
version the sending broker knows about rather than negotiating a mutually 
supported version. TransactionMarkerChannelManager is the only inter-broker 
NetworkClient that sets discoverBrokerVersions=false; all others 
(ReplicaFetcherThread, AlterPartitionManager, ForwardingManager, etc.) 
correctly use true.

*IMPACT:* When a 4.2+ broker is the transaction coordinator and needs to write 
markers to a 4.1 or earlier broker (partition leader),
it sends WriteTxnMarkersRequest v2, which the older broker doesn't support, 
causing UnsupportedVersionException. Transaction markers are never written, 
leaving transactions stuck in PrepareCommit/PrepareAbort. This prevents the LSO 
(Last Stable Offset) from
advancing, which blocks all read_committed consumers on affected partitions. 
The issue is self-resolving once all brokers are
upgraded to the same version, but transactions stuck during the mixed-version 
window remain blocked until the coordinator retries
successfully.

*AFFECTED CODE:* 
core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala

  
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L98|http://example.com]

  SOLUTION: Set discoverBrokerVersions to true in 
TransactionMarkerChannelManager's NetworkClient constructor. This enables the 
client
   to send ApiVersionsRequest when connecting to each broker, populate the 
ApiVersions cache, and use latestUsableVersion() to
  negotiate the correct request version. A one-line change.

  was:
This bug was introduced from 
[KIP-1228|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest]
 where a new WriteTxnMarker request version (v2) was introduced. However, the 
TransactionMarkerChannelManager still created its NetworkClient with 
discoverBrokerVersions=false, which disables API version negotiation with peer 
brokers. This causes the transaction coordinator to blindly send 
WriteTxnMarkersRequest at the latest supported version, which fails with 
UnsupportedVersionException when the target broker is running an older version 
that doesn't support that version. Transactions get permanently stuck in 
PrepareCommit or PrepareAbort during rolling upgrades.

*ROOT CAUSE:* TransactionMarkerChannelManager creates NetworkClient with 
discoverBrokerVersions=false, which disables API version discovery. Without 
version discovery, NetworkClient blindly uses the latest API version for all 
requests. Discovery should be set to true for compatibility.

*IMPACT:* When a 4.2 broker is the transaction coordinator and needs to write 
markers to a 4.1 or earlier broker (partition leader), it sends 
WriteTxnMarkersRequest v2, which the older broker doesn't support, causing 
UnsupportedVersionException.

*AFFECTED CODE:* origin/trunk: 
core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala

[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L98|http://example.com/]

*SOLUTION:* Set version discovery flag to true in the Transaction Marker 
Channel Manager

 


> TransactionMarkerChannelManager has discoverBrokerVersions=false causing 
> UnsupportedVersionException during rolling upgrades
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20322
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20322
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 4.2.0
>            Reporter: Ritika Reddy
>            Assignee: Ritika Reddy
>            Priority: Major
>             Fix For: 4.2.1
>
>
>  Bug: TransactionMarkerChannelManager does not discover broker API versions, 
> causing UnsupportedVersionException during rolling 
>   upgrades                                                                    
>                                                         
>                   
>  
> [KIP-1228|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest]
>  (KAFKA-19446) added WriteTxnMarkersRequest v2 with a TransactionVersion 
> field. However, TransactionMarkerChannelManager creates its NetworkClient 
> with discoverBrokerVersions=false, which disables API version negotiation 
> with peer brokers. Without version discovery, the ApiVersions cache is never 
> populated — apiVersions.get(nodeId) returns null in NetworkClient.doSend(), 
> causing it to fall through to builder.latestAllowedVersion() which blindly 
> uses the highest version the sending broker knows about rather than 
> negotiating a mutually supported version. TransactionMarkerChannelManager is 
> the only inter-broker NetworkClient that sets discoverBrokerVersions=false; 
> all others (ReplicaFetcherThread, AlterPartitionManager, ForwardingManager, 
> etc.) correctly use true.
> *IMPACT:* When a 4.2+ broker is the transaction coordinator and needs to 
> write markers to a 4.1 or earlier broker (partition leader),
> it sends WriteTxnMarkersRequest v2, which the older broker doesn't support, 
> causing UnsupportedVersionException. Transaction markers are never written, 
> leaving transactions stuck in PrepareCommit/PrepareAbort. This prevents the 
> LSO (Last Stable Offset) from
> advancing, which blocks all read_committed consumers on affected partitions. 
> The issue is self-resolving once all brokers are
> upgraded to the same version, but transactions stuck during the mixed-version 
> window remain blocked until the coordinator retries
> successfully.
> *AFFECTED CODE:* 
> core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala
>   
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L98|http://example.com]
>   SOLUTION: Set discoverBrokerVersions to true in 
> TransactionMarkerChannelManager's NetworkClient constructor. This enables the 
> client
>    to send ApiVersionsRequest when connecting to each broker, populate the 
> ApiVersions cache, and use latestUsableVersion() to
>   negotiate the correct request version. A one-line change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to