[ 
https://issues.apache.org/jira/browse/KAFKA-20322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ritika Reddy updated KAFKA-20322:
---------------------------------
    Description: 
*BUG:* TransactionMarkerChannelManager does not discover broker API versions, 
causing UnsupportedVersionException during rolling 
 upgrades                                                                       
                                                     
                  
[KIP-1228|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest]
 added WriteTxnMarkersRequest v2 with a TransactionVersion field. However, 
TransactionMarkerChannelManager creates its NetworkClient with 
discoverBrokerVersions=false, which disables API version negotiation with peer 
brokers. Without version discovery, the ApiVersions cache is never populated — 
apiVersions.get(nodeId) returns null in NetworkClient.doSend(), causing it to 
fall through to builder.latestAllowedVersion() which blindly uses the highest 
version the sending broker knows about rather than negotiating a mutually 
supported version. TransactionMarkerChannelManager is the only inter-broker 
NetworkClient that sets discoverBrokerVersions=false; all others 
(ReplicaFetcherThread, AlterPartitionManager, ForwardingManager, etc.) 
correctly use true.

*IMPACT:* When a 4.2+ broker is the transaction coordinator and needs to write 
markers to a 4.1 or earlier broker (partition leader),
it sends WriteTxnMarkersRequest v2, which the older broker doesn't support, 
causing UnsupportedVersionException. Transaction markers are never written, 
leaving transactions stuck in PrepareCommit/PrepareAbort. This prevents the LSO 
(Last Stable Offset) from
advancing, which blocks all read_committed consumers on affected partitions. 
The issue is self-resolving once all brokers are
upgraded to the same version, but transactions stuck during the mixed-version 
window remain blocked until the coordinator retries
successfully.

*AFFECTED CODE:* 
core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala

[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L98|http://example.com/]

*SOLUTION:* Set discoverBrokerVersions to true in 
TransactionMarkerChannelManager's NetworkClient constructor. This enables the 
client to send ApiVersionsRequest when connecting to each broker, populate the 
ApiVersions cache, and use latestUsableVersion() to
  negotiate the correct request version. A one-line change.

  was:
 Bug: TransactionMarkerChannelManager does not discover broker API versions, 
causing UnsupportedVersionException during rolling 
  upgrades                                                                      
                                                      
                  
 
[KIP-1228|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest]
 (KAFKA-19446) added WriteTxnMarkersRequest v2 with a TransactionVersion field. 
However, TransactionMarkerChannelManager creates its NetworkClient with 
discoverBrokerVersions=false, which disables API version negotiation with peer 
brokers. Without version discovery, the ApiVersions cache is never populated — 
apiVersions.get(nodeId) returns null in NetworkClient.doSend(), causing it to 
fall through to builder.latestAllowedVersion() which blindly uses the highest 
version the sending broker knows about rather than negotiating a mutually 
supported version. TransactionMarkerChannelManager is the only inter-broker 
NetworkClient that sets discoverBrokerVersions=false; all others 
(ReplicaFetcherThread, AlterPartitionManager, ForwardingManager, etc.) 
correctly use true.

*IMPACT:* When a 4.2+ broker is the transaction coordinator and needs to write 
markers to a 4.1 or earlier broker (partition leader),
it sends WriteTxnMarkersRequest v2, which the older broker doesn't support, 
causing UnsupportedVersionException. Transaction markers are never written, 
leaving transactions stuck in PrepareCommit/PrepareAbort. This prevents the LSO 
(Last Stable Offset) from
advancing, which blocks all read_committed consumers on affected partitions. 
The issue is self-resolving once all brokers are
upgraded to the same version, but transactions stuck during the mixed-version 
window remain blocked until the coordinator retries
successfully.

*AFFECTED CODE:* 
core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala

  
[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L98|http://example.com]

  SOLUTION: Set discoverBrokerVersions to true in 
TransactionMarkerChannelManager's NetworkClient constructor. This enables the 
client
   to send ApiVersionsRequest when connecting to each broker, populate the 
ApiVersions cache, and use latestUsableVersion() to
  negotiate the correct request version. A one-line change.


> TransactionMarkerChannelManager has discoverBrokerVersions=false causing 
> UnsupportedVersionException during rolling upgrades
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20322
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20322
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 4.2.0
>            Reporter: Ritika Reddy
>            Assignee: Ritika Reddy
>            Priority: Major
>             Fix For: 4.2.1
>
>
> *BUG:* TransactionMarkerChannelManager does not discover broker API versions, 
> causing UnsupportedVersionException during rolling 
>  upgrades                                                                     
>                                                        
>                   
> [KIP-1228|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest]
>  added WriteTxnMarkersRequest v2 with a TransactionVersion field. However, 
> TransactionMarkerChannelManager creates its NetworkClient with 
> discoverBrokerVersions=false, which disables API version negotiation with 
> peer brokers. Without version discovery, the ApiVersions cache is never 
> populated — apiVersions.get(nodeId) returns null in NetworkClient.doSend(), 
> causing it to fall through to builder.latestAllowedVersion() which blindly 
> uses the highest version the sending broker knows about rather than 
> negotiating a mutually supported version. TransactionMarkerChannelManager is 
> the only inter-broker NetworkClient that sets discoverBrokerVersions=false; 
> all others (ReplicaFetcherThread, AlterPartitionManager, ForwardingManager, 
> etc.) correctly use true.
> *IMPACT:* When a 4.2+ broker is the transaction coordinator and needs to 
> write markers to a 4.1 or earlier broker (partition leader),
> it sends WriteTxnMarkersRequest v2, which the older broker doesn't support, 
> causing UnsupportedVersionException. Transaction markers are never written, 
> leaving transactions stuck in PrepareCommit/PrepareAbort. This prevents the 
> LSO (Last Stable Offset) from
> advancing, which blocks all read_committed consumers on affected partitions. 
> The issue is self-resolving once all brokers are
> upgraded to the same version, but transactions stuck during the mixed-version 
> window remain blocked until the coordinator retries
> successfully.
> *AFFECTED CODE:* 
> core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L98|http://example.com/]
> *SOLUTION:* Set discoverBrokerVersions to true in 
> TransactionMarkerChannelManager's NetworkClient constructor. This enables the 
> client to send ApiVersionsRequest when connecting to each broker, populate 
> the ApiVersions cache, and use latestUsableVersion() to
>   negotiate the correct request version. A one-line change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to