mattisonchao opened a new pull request, #19374:
URL: https://github.com/apache/pulsar/pull/19374

   Fixes #19083 
   Fixes #11866 
   
   ### Motivation
   
   Currently, the delete namespace operation has the following steps:
   
   1. Do pre-check
   2. Get full of topics
   3. Mark the namespace as `deleted` to avoid creating the topic again by 
client lookup or reconnect.
   4. Delete all of the user-created topics
   5. Delete all system topics.
   6. Delete namespace event topics.
   7. clean up namespace metadata and resources
   
   A race condition will make step 2 unable to get full of topics. e.g:
   
   | -      | Thread A                                                          
                       | Thread B                                               
     |
   
|--------|------------------------------------------------------------------------------------------|-------------------------------------------------------------|
   | time 1 | Got full of topics                                                
                       | Trying to create a new topic and passed the `deleted` 
check |
   | time 2 | Mark `deleted`                                                    
                       | Do other checks                                        
     |
   | time 3 | Do the rest of steps 4,5,6                                        
                       | Created managed ledger and has persistent info to 
metadata  |
   | time 4 | Step 7, Got exception `Directory not empty for 
/managed-ledgers/test-tenant/test-ns2/persistent` | Do the rest work            
                                |
   | time 5 | Return the exception                                              
                       | Topic created                                          
     |
    
   
   This problem also exists for user-created topics, but I don't think we need 
to deal with them. The user can run the command multiple times and it will 
clean up successfully. Otherwise, we need to use a heavily distributed lock to 
ensure that.
   However, we need to do something about the internal `change_event` topic. we 
can always assume the existence of a `change_event` topic to avoid race 
conditions where the topic passes the check but is not yet persisted in the 
metadata.
   
   I've tested in my local env 1k times. It works great for me.
   <img width="657" alt="image" 
src="https://user-images.githubusercontent.com/74767115/215781535-c993e7f3-aa8d-406e-b720-0aa97f72dfab.png";>
   
   
   ### Modifications
   
   > Deleting topic xxxx because local cluster is not part of global namespace 
repl list
   
   - Remove `namespacePolicies.deleted` logic to ensure the new topic will not 
be deleted by `PersistenTopic#checkReplication`.
   - Always assume the existence of an event topic to avoid race conditions 
where the topic passes the check but is not yet persisted in the metadata.
   - Add a test case to test if the subject name object is left behind.
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. Please attach the local 
preview screenshots (run `sh start.sh` at `pulsar/site2/website`) to your PR 
description, or else your PR might not get merged. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update 
later -->
   - [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to