mattisonchao opened a new pull request, #15233:
URL: https://github.com/apache/pulsar/pull/15233

   ### Motivation
   
   In #4455 and #10740, The `OpAddEntry` try to using `opAddEntry.addOpCount` 
to ensure the `opAddEntry` we got is what we want.(avoid recycle cause race 
condition)
   
   It seems exists a case causing another race condition. consider the case as 
follow:
   
   #### Precondition
   
   - We have an `OpAddEntry` that name is A.
   - We got two threads: a timeout check thread and another is a write thread 
that wants to complete and recycle this `OpAddEntry`.
   - Relative code is as follow(timeout check logic):
   
   
https://github.com/apache/pulsar/blob/caf7648653c140c9f922e86425585ab7b5ed3ed6/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java#L3825-L3840
   
   #### Race condition process
   
   1. When the timeout checker thread gets `OpAddEntry` A and continues to run 
until line 3835.
   2. At the same time, the writing thread also gets `OpAddEntry`A, completes 
and recycles.
   3. At this time, `OpAddEntry`A is taken out of the recycling pool by other 
threads as `OpAddEntry`B. But at this time the timeout thread has passed the 
timeout check(line 3834), and `addOpCount` is also obtained from OpAddEntry B, 
which will cause the `compareAndSet` check to pass.
   
   #### Affect
   
   - The timeout checker will make a new `OpAddEntry` complete timeout.
   
   ### Modifications
   
   - Record `opAddCount` before timeout judgment.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   ### Documentation
   
   - [x] `no-need-doc` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to