TakaHiR07 opened a new issue, #20508: URL: https://github.com/apache/pulsar/issues/20508
### Search before asking - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Version master ### Minimal reproduce step In our production environment, only one broker is down, the others are available. However, the clients config with multi-serviceUrl can not retry connect success, continue throwing connectionTimeout exception. The root cause is in PulsarServiceNameResolver#resolveHost. PulsarClient use the same pulsarServiceNameResolver instance to resolveHost when do retry connect. And the retry logic in pulsarServiceNameResolve is roundrobin. However, if we use pulsarClient to create producer on a partitioned-topic, all the partition share the same pulsarServiceNameResolver. So the retry logic actually is not roundrobin, but random. The more partitions in topic, the easier this bug occur. ### What did you expect to see? single point of failure should not exist ### What did you see instead? single point of failure ### Anything else? _No response_ ### Are you willing to submit a PR? - [X] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
