andrekramer1 commented on issue #11070: URL: https://github.com/apache/pulsar/issues/11070#issuecomment-936457985
Some further investigation found that the first Zookeeper created for the Kubernetes statefull set does not respond to the ready/liveness probe. This uses the "ruok" command and the reply from the server is to close the connection (as Zookeeper is not up and running). So the second and third replicas are never created. Somehow Zookeeper has stopped responding while initializing / creating a quorum. This can be confirmed by setting the enabled flag on Zookeeper ready and liveness probes to false in the helm chart. With probes disabled managed to initialize a 3 node cluster. Created a debug branch of Zookeeper modified to respond to ruok and other client requests even when not fully initialized. With these changes it's also possible to bring up Zookeepers and Pulsar cluster with the probes enabled. The branch is here: https://github.com/andrekramer1/zookeeper/tree/early-ruok Would be possible to create a pull request from this but the implications of allowing client connections while Zookeeper is initializing would need to be considered. Hopefully the change list can help fix this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
