shibd opened a new pull request, #22860:
URL: https://github.com/apache/pulsar/pull/22860

   ### Motivation
   We observe that a `normal topic` might reference a `closed ledger` and it 
never auto recover. will cause the producer and customer stuck.
   
   The root cause is related to `topic create timeout`. Assuming we have two 
concurrent requests to create a topic: `firstCreate(Thread-1)`, 
`secondCreate(Thread-2)`
   
   
   | Thread-1                                            | Thread-2             
                                                         |
   |----------------|-------------------------------------------------|
   | firstCreate topic timeout                              |                   
                                                            |
   |                                                     | [remove this old 
topic and recreate a new 
topic](https://github.com/shibd/pulsar/blob/f07b3a030179c38f9786b3e26c82aa13e00b34a6/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1009-L1012)
                                |
   |                                                     | Open ledger from 
cache, and get an old ledger  that create by firstCreate topic                  
                      |
   | call [topic.close  
](https://github.com/apache/pulsar/blob/31bfded7d717afd3e76de4d839bdb3daebf704a0/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1749-L1755)
                              |                                                 
                              |
   | old ledger close and remove it from cache |                                
                                               |
   |                                                     | but this `old 
ledger` being referenced to `new topic` and that stats is close |
   
   
   - When the `firstCreate` topic timeout, will call  `topic.close`. it will 
close the ledger, and remove it from the ledger cache.
   
https://github.com/apache/pulsar/blob/f07b3a030179c38f9786b3e26c82aa13e00b34a6/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1745-L1752
   
   
   - If the `secondCreate` request successfully creates a topic before the `old 
ledger closes`, the reference will be made to the `old ledger`.
   
   
   ### Modifications
   Refactor `BrokerService#getTopic()` method:
     - If any exceptions occur during the creation topics process, they should 
be proactively and asynchronously removed from the topics: 
`pulsar.getExecutor().execute(() -> topics.remove(topicName.toString()));`
     - The new get topic operation should not remove the `topicFuture` that 
created by previous get topic operation. If the previous topicFuture is not 
removed from the map yet, the broker should always use the existing topicFuture 
(waiting for completion or return an error to the client side directly)
     - Add documentation to clearly define the method boundaries
     - During the process of creating a topic, all the remove methods should 
`not include topicFuture` because the actual future object placed in the map 
might not be the same as the `topicFuture`. You can check this 
[code](https://github.com/apache/pulsar/blob/21596227415968dd9b9219779657ff74decfa45b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1042-L1073)
 to validation it.
   
   
   ### Verifying this change
   - Add testCloseLedgerThatTopicAfterCreateTimeout to cover this case.
   
   ### Does this pull request potentially affect one of the following parts:
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update 
later -->
   - [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->
   
   ### Matching PR in forked repository
   
   PR in forked repository: https://github.com/shibd/pulsar/pull/37
   
   <!--
   After opening this PR, the build in apache/pulsar will fail and instructions 
will
   be provided for opening a PR in the PR author's forked repository.
   
   apache/pulsar pull requests should be first tested in your own fork since 
the 
   apache/pulsar CI based on GitHub Actions has constrained resources and quota.
   GitHub Actions provides separate quota for pull requests that are executed 
in 
   a forked repository.
   
   The tests will be run in the forked repository until all PR review comments 
have
   been handled, the tests pass and the PR is approved by a reviewer.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to