BewareMyPower opened a new pull request, #16043:
URL: https://github.com/apache/pulsar/pull/16043
…ated concurrently
### Motivation
When a partitioned topic was created concurrently, especially when
automatically created by many producers. This case can be reproduced
easily by configuring `allowAutoTopicCreationType=non-partitioned` and
starting a Pulsar standalone. Then, run the following code:
```java
try (PulsarClient client = PulsarClient.builder()
.serviceUrl("pulsar://localhost:6650").build()) {
for (int i = 0; i < 10; i++) {
client.newProducer().topic("topic").createAsync();
}
Thread.sleep(1000);
}
```
We can see a lot of "Could not get connection while
getPartitionedTopicMetadata" warning logs at client side, while there
were more warning logs with full stack traces at broker side:
```
2022-06-14T02:04:20,522+0800 [metadata-store-22-1] WARN
org.apache.pulsar.broker.service.ServerCnx - Failed to get Partitioned Metadata
[/127.0.0.1:64846] persistent://public/default/topic:
org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException:
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
BadVersion for /admin/partitioned-topics/public/default/persistent/topic
org.apache.pulsar.metadata.api.MetadataStoreException$AlreadyExistsException:
org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException:
org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
BadVersion for /admin/partitioned-topics/public/default/persistent/topic
```
It's because when broker handles the partitioned metadata command, it
calls `fetchPartitionedTopicMetadataCheckAllowAutoCreationAsync` and
will try creating a partitioned topic if it doesn't exist. It's a race
condition that if many connections are established during a short time
interval and one of them created successfully, the following will fail
with the `AlreadyExistsException`.
### Modifications
Handles the `MetadataStoreException.AlreadyExistsException` in
`unsafeGetPartitionedTopicMetadataAsync`. In this case, invoke
`fetchPartitionedTopicMetadataAsync` to get the partitioned metadata
again.
### Verifying this change
Even if without this patch, the creation of producers could also succeed
because they will reconnect to broker again after 100 ms because broker
will return a `ServiceNotReady` error in thiss case. The only way to
verify this fix is reproducing the bug again with this patch, we can
see no reconnection will happen from the logs.
### Documentation
Check the box below or label this PR directly.
Need to update docs?
- [ ] `doc-required`
(Your PR needs to update docs and you will update later)
- [x] `doc-not-needed`
(Please explain why)
- [ ] `doc`
(Your PR contains doc changes)
- [ ] `doc-complete`
(Docs have been already added)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]