TakaHiR07 opened a new issue, #20508:
URL: https://github.com/apache/pulsar/issues/20508

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Version
   
   master
   
   ### Minimal reproduce step
   
   In our production environment, only one broker is down, the others are 
available. However, the clients config with multi-serviceUrl can not retry 
connect success, continue throwing connectionTimeout exception.
   
   The root cause is in PulsarServiceNameResolver#resolveHost. PulsarClient use 
the same pulsarServiceNameResolver instance to resolveHost when do retry 
connect. And the retry logic in pulsarServiceNameResolve is roundrobin.
   
   However, if we use pulsarClient to create producer on a partitioned-topic, 
all the partition share the same pulsarServiceNameResolver. So the retry logic 
actually is not roundrobin, but random.
   
   The more partitions in topic, the easier this bug occur. 
   
   ### What did you expect to see?
   
   single point of failure should not exist
   
   ### What did you see instead?
   
   single point of failure
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to