snowcrumble opened a new issue #6123: Improve recover time from SPOF URL: https://github.com/apache/pulsar/issues/6123 **Is your feature request related to a problem? Please describe.** I have done some testing of SPOF by kill a broker which dealing a topic pub/sub on it, my pulsar is deployed on k8s. The problem is it cost avg 1 minutes for recover time and I found some possible reasons: 1. I found the broker ip in zk "/loadbalance/brokers" was still exist until about 10 second after the broker down, so I think broker is not unregister when it is killed 2. The cpp may has no option of connect timeout, I found it cost about 30 second on `async_resolve` a broker ip which is not exist any more (k8s pod's ip is remove from iptable after be deleted) **Describe the solution you'd like** 1. Active unregister broker ip when not serving any more 2. Add an option of connect timeout for cpp client. **Describe alternatives you've considered** **Additional context**
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
