merlimat opened a new pull request, #25959: URL: https://github.com/apache/pulsar/pull/25959
### Motivation Scalable-topics (v5) transactions never timed out at the requested value: a 1-minute timeout was applied as **~16.7 hours**, so the coordinator's timeout sweep effectively never aborted a dangling (never-committed) transaction, leaving its transaction buffer pinned indefinitely (which also blocks visibility of later writes on the same segment). **Root cause.** The `NEW_TXN` handler's v5 branch in `ServerCnx` multiplied the wire ttl by 1000. But the ttl field has always carried **milliseconds** — the client sends `unit.toMillis(...)`, and both the legacy coordinator (`TransactionMetadataStoreService.newTransaction`, parameter `timeoutInMills`) and the v5 coordinator (`TransactionCoordinatorV5.newTransaction`, `Duration.ofMillis(...)`) consume milliseconds. The field's legacy name, `txn_ttl_seconds`, was a misnomer that invited the bug. ### Modifications - `ServerCnx.handleNewTxn`: drop the `* 1000` on the v5 path; pass the millisecond value straight through, matching the legacy path. - Rename the wire field `CommandNewTxn.txn_ttl_seconds` → `txn_ttl_millis` so the unit confusion can't recur, and update `Commands.newTxn` accordingly. - Add `V5TransactionTimeoutTest`: a never-committed v5 transaction must be aborted by the broker timeout sweep, unpinning the buffer so a later non-transactional message becomes visible. ### Compatibility The binary protocol is **unchanged**: the field number (`2`) and type (`uint64`) are identical — only the field's name and the generated Java accessor names change (`getTxnTtlSeconds` → `getTxnTtlMillis`). Older and newer clients/brokers interoperate exactly as before. ### Verifying this change - New `V5TransactionTimeoutTest` reproduces the bug (failed before the fix, passes after). - No regression: `V5TransactionTest` 7/7, `V5TransactionScalableTest` 8/8, `V5TransactionRecoveryTest` 3/3, `TransactionCoordinatorV5Test` 22/22, `CommandsScalableTxnFlagTest` 6/6; checkstyle clean. ### Does this pull request potentially affect one of the following parts: - Dependencies: no - The public API: no - The schema: no - The default values of configurations: no - The threading model: no - The binary protocol: no (field number and type unchanged; only the field name) - The REST endpoints: no - The admin CLI options: no - Anything that affects deployment: no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
