merlimat opened a new pull request, #25945:
URL: https://github.com/apache/pulsar/pull/25945

   ## Motivation
   
   The metadata-driven (v5, PIP-473) transaction coordinator and the legacy 
(v4) coordinator must serve their own clients on the **same cluster**, and the 
v5 coordinator should be enabled by default. Before this change, the broker 
chose the coordinator from a global config flag, so enabling the v5 TC routed 
**every** client — including v4 SDK clients on `persistent://` topics — to the 
v5 coordinator. That breaks v4 transactions: the v5 TC notifies participants 
via metadata-store events, which the legacy transaction buffer / pending-ack 
store don't consume.
   
   This PR routes coordination by **client/SDK kind** instead of broker 
capability, then flips the defaults so v5 is on out of the box while v4 keeps 
working.
   
   ## Modifications
   
   **Routing by SDK kind**
   - New internal `ClientConfigurationData.scalableTransactions` flag — *not* 
on the public `ClientBuilder`. `PulsarClientBuilderV5` sets it `true`; the v4 
builder never does.
   - `TransactionCoordinatorClientImpl.selectDiscovery()` routes by that flag: 
v5 SDK → metadata-store coordinator (assignment watch), v4 SDK → legacy 
assign-topic coordinator. A v5 SDK on an `http://` service URL now fails 
clearly rather than silently downgrading.
   - Wire: an optional `scalable` flag on `CommandTcClientConnectRequest`, 
`CommandNewTxn`, `CommandEndTxn`, `CommandAddPartitionToTxn`, 
`CommandAddSubscriptionToTxn` (additive, defaults false). `ServerCnx` routes 
each command to the v5 TC when `scalable=true` (rejecting it if the v5 TC isn't 
enabled), else to the legacy TC; `verifyTxnOwnership` is parameterized by the 
same flag.
   
   No shared `TxnID` namespace is needed: the two coordinators keep separate 
state, and a client only ever talks to one kind, so a bare `TxnID` is never 
looked up across both.
   
   **Default flip (P5.4)**
   - `transactionCoordinatorScalableTopicsEnabled` → `true`
   - `transactionBufferProviderClassName` → 
`DispatchingTransactionBufferProvider`
   - `transactionPendingAckStoreProviderClassName` → 
`DispatchingTransactionPendingAckStoreProvider`
   
   The dispatching providers route `segment://` topics to the metadata-driven 
implementations and `persistent://` / `topic://` to the legacy ones by 
`TopicName.isSegment()`, so v4 topics get the unchanged legacy buffer/ack-store.
   
   ## Verification
   
   - Unit: `CommandsScalableTxnFlagTest` round-trips the `scalable` flag on all 
five commands (present + default-false).
   - Integration (`TcMetadataDiscoveryTest`, verified locally against a freshly 
built docker image — **all pass**): on a v5-enabled cluster, a **v5 SDK** 
client drives the transaction lifecycle and survives a leader-broker kill via 
the metadata coordinator, while a **v4 SDK** client runs a transaction on a 
`persistent://` topic via the legacy coordinator — concurrently. Asserts the v4 
client does **not** use metadata-store discovery.
   
   ## Notes for reviewers
   
   - This adds client-broker wire-protocol surface (the additive `scalable` 
flag); flagging for protocol review.
   - The enable-by-default flip means transaction-enabled broker tests now also 
start the v5 TC; CI exercises that breadth.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to