mattisonchao opened a new pull request, #15233: URL: https://github.com/apache/pulsar/pull/15233
### Motivation In #4455 and #10740, The `OpAddEntry` try to using `opAddEntry.addOpCount` to ensure the `opAddEntry` we got is what we want.(avoid recycle cause race condition) It seems exists a case causing another race condition. consider the case as follow: #### Precondition - We have an `OpAddEntry` that name is A. - We got two threads: a timeout check thread and another is a write thread that wants to complete and recycle this `OpAddEntry`. - Relative code is as follow(timeout check logic): https://github.com/apache/pulsar/blob/caf7648653c140c9f922e86425585ab7b5ed3ed6/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L3825-L3840 #### Race condition process 1. When the timeout checker thread gets `OpAddEntry` A and continues to run until line 3835. 2. At the same time, the writing thread also gets `OpAddEntry`A, completes and recycles. 3. At this time, `OpAddEntry`A is taken out of the recycling pool by other threads as `OpAddEntry`B. But at this time the timeout thread has passed the timeout check(line 3834), and `addOpCount` is also obtained from OpAddEntry B, which will cause the `compareAndSet` check to pass. #### Affect - The timeout checker will make a new `OpAddEntry` complete timeout. ### Modifications - Record `opAddCount` before timeout judgment. ### Verifying this change - [x] Make sure that the change passes the CI checks. ### Documentation - [x] `no-need-doc` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
