merlimat opened a new pull request, #25884:
URL: https://github.com/apache/pulsar/pull/25884

   ## Summary
   
   Adds the periodic background sweeps for the scalable-topics transaction 
coordinator landed in [#25863](https://github.com/apache/pulsar/pull/25863) 
(P5.1). Both run on a dedicated single-thread scheduler started by 
`PulsarService` and gated on assign-partition-0 ownership — only the elected 
broker sweeps each cycle. Concurrent sweeps from a stale owner remain safe 
because every state transition is a header CAS; the election is purely an 
efficiency measure (per-partition scoping comes with the partitioned TC in a 
later phase).
   
   ### Timeout sweep
   
   Default cadence **60s**. Scans the by-deadline index up to `now` and drives 
each expired open txn through `endTransaction(ABORT)`, which re-reads and 
CAS-guards the header — so a txn the client commits in the same window is left 
alone (the resulting `InvalidTxnStatusException` / `BadVersionException` is 
treated as a benign race and logged at debug).
   
   ### GC sweep
   
   Default cadence **300s**, retention **900s**. For each terminal state, scans 
the by-final-state index up to `now - retention`. For each candidate:
   
   - If leftover `/txn/op` records remain — some participant hasn't applied the 
outcome yet, or never received the event (e.g. the TC crashed between the 
header CAS and the fan-out) — re-drive `fanOutEvents` and **leave the header in 
place** so the participant can re-read the true outcome. It removes its op 
records once it applies them, and a later GC pass — seeing no op records — 
deletes the header.
   - If no op records remain, delete the header.
   
   This ordering closes the fan-out-durability gap [lhotari raised on 
#25863](https://github.com/apache/pulsar/pull/25863#discussion_r3298435980) 
without ever stranding a committed txn's data: we never delete a header while a 
participant might still re-read it (which would default the outcome to ABORTED).
   
   ### Config
   
   | Key | Default |
   |---|---|
   | `transactionCoordinatorScalableTopicsTimeoutSweepIntervalSeconds` | 60 |
   | `transactionCoordinatorScalableTopicsGcIntervalSeconds` | 300 |
   | `transactionCoordinatorScalableTopicsGcRetentionSeconds` | 900 |
   
   All only meaningful when `transactionCoordinatorScalableTopicsEnabled = 
true` (still off by default).
   
   ### Drive-by
   
   Refactored `fanOutEvents` to use 
`FutureUtil.waitForAll(List<CompletableFuture<Void>>)` — matches the new sweep 
methods and addresses the same comment lhotari left on P5.1.
   
   ## Test plan
   
   - [x] `pulsar-broker:test --tests TransactionCoordinatorV5Test` — 5 new 
sweep cases plus all P5.1 cases:
     - `sweepTimeouts_abortsExpiredOpenTxnAndFansOut`
     - `sweepTimeouts_leavesUnexpiredOpenTxnAlone`
     - `sweepGc_deletesHeaderWhenNoOpsRemain`
     - `sweepGc_repairsAndRetainsHeaderWhenOpsRemain` (the fan-out-durability 
scenario)
     - `sweeps_skipWhenNotElected`
   - [x] `pulsar-broker:test --tests TxnMetadataStoreTest` / 
`MetadataTransactionBufferTest` / `MetadataPendingAckStoreTest` — green.
   - [x] Checkstyle clean (main + test).
   
   ## Deferred / follow-ups
   
   - **Per-partition sweep scoping** lands with the partitioned TC (P5.3), 
replacing the single-elected-sweeper interim.
   - **Pure metadata-store leader election** also belongs to P5.3.
   - A leftover op record from a permanently-gone participant (segment deleted) 
currently keeps its header alive forever — the GC sweep keeps re-publishing 
harmlessly. A future phase can add a liveness check to force-cleanup, but doing 
so safely needs the participant-liveness signal that doesn't exist yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to