merlimat opened a new pull request, #25945: URL: https://github.com/apache/pulsar/pull/25945
## Motivation The metadata-driven (v5, PIP-473) transaction coordinator and the legacy (v4) coordinator must serve their own clients on the **same cluster**, and the v5 coordinator should be enabled by default. Before this change, the broker chose the coordinator from a global config flag, so enabling the v5 TC routed **every** client — including v4 SDK clients on `persistent://` topics — to the v5 coordinator. That breaks v4 transactions: the v5 TC notifies participants via metadata-store events, which the legacy transaction buffer / pending-ack store don't consume. This PR routes coordination by **client/SDK kind** instead of broker capability, then flips the defaults so v5 is on out of the box while v4 keeps working. ## Modifications **Routing by SDK kind** - New internal `ClientConfigurationData.scalableTransactions` flag — *not* on the public `ClientBuilder`. `PulsarClientBuilderV5` sets it `true`; the v4 builder never does. - `TransactionCoordinatorClientImpl.selectDiscovery()` routes by that flag: v5 SDK → metadata-store coordinator (assignment watch), v4 SDK → legacy assign-topic coordinator. A v5 SDK on an `http://` service URL now fails clearly rather than silently downgrading. - Wire: an optional `scalable` flag on `CommandTcClientConnectRequest`, `CommandNewTxn`, `CommandEndTxn`, `CommandAddPartitionToTxn`, `CommandAddSubscriptionToTxn` (additive, defaults false). `ServerCnx` routes each command to the v5 TC when `scalable=true` (rejecting it if the v5 TC isn't enabled), else to the legacy TC; `verifyTxnOwnership` is parameterized by the same flag. No shared `TxnID` namespace is needed: the two coordinators keep separate state, and a client only ever talks to one kind, so a bare `TxnID` is never looked up across both. **Default flip (P5.4)** - `transactionCoordinatorScalableTopicsEnabled` → `true` - `transactionBufferProviderClassName` → `DispatchingTransactionBufferProvider` - `transactionPendingAckStoreProviderClassName` → `DispatchingTransactionPendingAckStoreProvider` The dispatching providers route `segment://` topics to the metadata-driven implementations and `persistent://` / `topic://` to the legacy ones by `TopicName.isSegment()`, so v4 topics get the unchanged legacy buffer/ack-store. ## Verification - Unit: `CommandsScalableTxnFlagTest` round-trips the `scalable` flag on all five commands (present + default-false). - Integration (`TcMetadataDiscoveryTest`, verified locally against a freshly built docker image — **all pass**): on a v5-enabled cluster, a **v5 SDK** client drives the transaction lifecycle and survives a leader-broker kill via the metadata coordinator, while a **v4 SDK** client runs a transaction on a `persistent://` topic via the legacy coordinator — concurrently. Asserts the v4 client does **not** use metadata-store discovery. ## Notes for reviewers - This adds client-broker wire-protocol surface (the additive `scalable` flag); flagging for protocol review. - The enable-by-default flip means transaction-enabled broker tests now also start the v5 TC; CI exercises that breadth. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
