snowcrumble opened a new issue #6123: Improve recover time from SPOF
URL: https://github.com/apache/pulsar/issues/6123
 
 
   **Is your feature request related to a problem? Please describe.**
   I have done some testing of SPOF by kill a broker which dealing a topic 
pub/sub on it, my pulsar is deployed on k8s.
   The problem is it cost avg 1 minutes for recover time and I found some 
possible reasons:
   
   1. I found the broker ip in zk "/loadbalance/brokers" was still exist until 
about 10 second after the broker down, so I think broker is not unregister when 
it is killed
   
   2. The cpp may has no option of connect timeout, I found it cost about 30 
second on `async_resolve` a broker ip which is not exist any more (k8s pod's ip 
is remove from iptable after be deleted)
   
   **Describe the solution you'd like**
   1. Active  unregister broker ip when not serving any more
   2. Add an option of connect timeout for cpp client.
   
   **Describe alternatives you've considered**
   
   **Additional context**
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to