void-ptr974 opened a new pull request, #25953:
URL: https://github.com/apache/pulsar/pull/25953

   ### Motivation
   
   Message deduplication has a `Recovering` status to prevent overlapping state 
transitions, but the enable path started cursor replay without first moving the 
state to `Recovering`.
   
   A common high-frequency case is topic load with existing topic-level 
policies. The topic load path can trigger dedup status checks from two flows 
before the first recovery replay completes:
   
   | Time | Flow A: existing topic-level policy during load | Flow B: normal 
topic load chain | Dedup status / effect |
   
|------|--------------------------------------------------|----------------------------------|-----------------------|
   | T1 | `initTopicPolicy()` loads existing topic policies | | `Initialized` 
or `Disabled` |
   | T2 | `onUpdate()` applies topic policies and calls 
`checkDeduplicationStatus()` | | First dedup recovery replay starts |
   | T3 | The first replay is asynchronous and still running | | Status is 
still `Initialized` or `Disabled` because enable did not set `Recovering` |
   | T4 | | Topic load continues and `BrokerService` explicitly calls 
`checkDeduplicationStatus()` | |
   | T5 | | The second check also enters the enable path | A second replay 
starts for the same dedup cursor |
   | T6 | First replay is rebuilding producer sequence state | Second replay is 
also rebuilding producer sequence state | Overlapping replay can advance shared 
replay/cursor state and leave recovered dedup state incomplete or inconsistent |
   
   Dedup recovery rebuilds producer sequence information from the dedup cursor. 
If overlapping replay leaves the recovered sequence state incomplete or 
inconsistent, the broker may fail to recognize already-published messages as 
duplicates and accept duplicate messages after recovery.
   
   There was also a retry gap after enable failure. A transient failure moved 
deduplication to `Failed`, but later checks did not retry enabling even when 
the topic policy still required deduplication.
   
   ### Modifications
   
   - Move deduplication to `Recovering` before starting cursor replay.
   - Allow `Failed` status to retry enabling when deduplication should be 
enabled.
   - Allow `Failed` status to proceed with disabling when deduplication should 
be disabled.
   - Keep enable failures visible by leaving the status as `Failed`.
   - Add tests for concurrent recovery and retry after failed enable.
   
   ### Verifications
   
   ```bash
   ./gradlew :pulsar-broker:test --tests 
org.apache.pulsar.broker.BrokerMessageDeduplicationTest
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to