poorbarcode opened a new pull request, #21948:
URL: https://github.com/apache/pulsar/pull/21948
### Motivation
There is a race condition that makes an orphan replicator in the original
owner of a topic, and causes the new owner of the topic can not start a
replicator due to
`org.apache.pulsar.broker.service.BrokerServiceException$NamingException
Producer with name 'pulsar.repl.{local_cluster}-->{remote_cluster}' is already
connected to topic`.
**Scenario 1**
- Thread-1: start/restart the producer of the replicator.
- Thread-2: unloading bundles.
**Scenario 2**
- Thread-1: start a new replicator after updated `replication_clusters`.
- Thread-2: unloading bundles.
After we solved the scenario 1 by
https://github.com/apache/pulsar/pull/21946, the current PR is focusing on the
scenario 2:
Current PR is focusing on Scenario 1.
**Steps of Scenario 1**
| time | `thread enable replication` | thread `unload bundle` |
| --- | --- | --- |
| 1 | Enabled replication |
| 2 | | Mark topic as `closing` |
| 3 | | Skip `replicator.disconnect()` because `topic.replicators` is empty |
| 4 | Initialize cursor `pulsar.repl` |
| 5 | Start producer |
| 6 | Set `replicator.stat --> Starting` |
| 7 | Create producer success and set `replicator.stat --> Started` |
| 8 | Trigger a `readMoreEntries`, since there is no entries to read, just
pending this request |
| 9 | | Close cursor `pulsar.repl` |
| 10 | | Close managed ledger |
| 11 | An orphan replicator is there, and the next topic owner could not
start a replicator due to `Producer with name
'pulsar.repl.{local_cluster}-->{remote_cluster}' is already connected to topic`
|
Since the scenario is too complex, I can not add a test.
TODO: reproduce the Scenario 2 locally.
### Modifications
- call `replicators.disconnect` after the managed ledger is closed. It would
prevent the new cursor(`pulsar.dedup`) from being created.
- `topic.close` will be done after `replicators.disconnect`, it can avoid
the new replicator on the next owner broker of the topic failing due to
creating an internal producer failed
`org.apache.pulsar.broker.service.BrokerServiceException$NamingException
Producer with name 'pulsar.repl.{local_cluster}-->{remote_cluster}' is already
connected to topic`.
- After https://github.com/apache/pulsar/pull/21947 the operation
`replicator.producer.close` will no longer fail.
### Documentation
<!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
- [ ] `doc` <!-- Your PR contains doc changes. -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update
later -->
- [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->
### Matching PR in forked repository
PR in forked repository: x
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]