shibd opened a new pull request, #22860: URL: https://github.com/apache/pulsar/pull/22860
### Motivation We observe that a `normal topic` might reference a `closed ledger` and it never auto recover. will cause the producer and customer stuck. The root cause is related to `topic create timeout`. Assuming we have two concurrent requests to create a topic: `firstCreate(Thread-1)`, `secondCreate(Thread-2)` | Thread-1 | Thread-2 | |----------------|-------------------------------------------------| | firstCreate topic timeout | | | | [remove this old topic and recreate a new topic](https://github.com/shibd/pulsar/blob/f07b3a030179c38f9786b3e26c82aa13e00b34a6/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1009-L1012) | | | Open ledger from cache, and get an old ledger that create by firstCreate topic | | call [topic.close ](https://github.com/apache/pulsar/blob/31bfded7d717afd3e76de4d839bdb3daebf704a0/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1749-L1755) | | | old ledger close and remove it from cache | | | | but this `old ledger` being referenced to `new topic` and that stats is close | - When the `firstCreate` topic timeout, will call `topic.close`. it will close the ledger, and remove it from the ledger cache. https://github.com/apache/pulsar/blob/f07b3a030179c38f9786b3e26c82aa13e00b34a6/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1745-L1752 - If the `secondCreate` request successfully creates a topic before the `old ledger closes`, the reference will be made to the `old ledger`. ### Modifications Refactor `BrokerService#getTopic()` method: - If any exceptions occur during the creation topics process, they should be proactively and asynchronously removed from the topics: `pulsar.getExecutor().execute(() -> topics.remove(topicName.toString()));` - The new get topic operation should not remove the `topicFuture` that created by previous get topic operation. If the previous topicFuture is not removed from the map yet, the broker should always use the existing topicFuture (waiting for completion or return an error to the client side directly) - Add documentation to clearly define the method boundaries - During the process of creating a topic, all the remove methods should `not include topicFuture` because the actual future object placed in the map might not be the same as the `topicFuture`. You can check this [code](https://github.com/apache/pulsar/blob/21596227415968dd9b9219779657ff74decfa45b/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java#L1042-L1073) to validation it. ### Verifying this change - Add testCloseLedgerThatTopicAfterCreateTimeout to cover this case. ### Does this pull request potentially affect one of the following parts: <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> *If the box was checked, please highlight the changes* - [ ] Dependencies (add or upgrade a dependency) - [ ] The public API - [ ] The schema - [ ] The default values of configurations - [ ] The threading model - [ ] The binary protocol - [ ] The REST endpoints - [ ] The admin CLI options - [ ] The metrics - [ ] Anything that affects deployment ### Documentation <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. --> - [ ] `doc` <!-- Your PR contains doc changes. --> - [ ] `doc-required` <!-- Your PR changes impact docs and you will update later --> - [x] `doc-not-needed` <!-- Your PR changes do not impact docs --> - [ ] `doc-complete` <!-- Docs have been already added --> ### Matching PR in forked repository PR in forked repository: https://github.com/shibd/pulsar/pull/37 <!-- After opening this PR, the build in apache/pulsar will fail and instructions will be provided for opening a PR in the PR author's forked repository. apache/pulsar pull requests should be first tested in your own fork since the apache/pulsar CI based on GitHub Actions has constrained resources and quota. GitHub Actions provides separate quota for pull requests that are executed in a forked repository. The tests will be run in the forked repository until all PR review comments have been handled, the tests pass and the PR is approved by a reviewer. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org