roman-popenov opened a new pull request #6158: [Issue-5994][helm]: Start proxy pods when at least one broker pod is running URL: https://github.com/apache/pulsar/pull/6158 ### Motivation Fixes #5994: If the proxy service comes up before the brokers are up and reachable there will be HTTP 403 when running `bin/pulsar-admin` commands from inside the proxy pod. The proxy will also not be able to connect to the brokers when data is pushed through binary port with the following error: ```bash Caused by: org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available ... 14 more Caused by: org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available 22:11:07.633 [pulsar-web-32-6] INFO org.eclipse.jetty.server.RequestLog - 172.17.0.6 - - [24/Jan/2020:22:11:07 +0000] "PUT /admin/v2/persistent/public/functions/assignments HTTP/1.1" 500 2528 "-" "Pulsar-Java-v2.5.0" 280 ``` #### Workaround: Restart the proxy pods once brokers pods are running #### Proposed solution: Hold off starting of the proxies until at least one broker is reachable in the cluster. ### Modifications Changes are inside `proxy-deployment.yaml` helm template file that defines a new init container before proxy is started. The init container waits until broker is reachable using the nslookup on the broker service with a sleep of 30 seconds between retries and up to number of brokers times. Alternative solution that doesn't always work was `'until nslookup broker-service; sleep 2; done;', but 403 would still sometimes (could have been a fluke, but I saw it happening once). ### Verifying this change 1) Follow the instructions on how deploying helm and run: `helm install pulsar --values pulsar/values-mini.yaml ./pulsar/`. 2) Wait until all the services are up and running. 3) Connect to proxy pod and run `bin/pulsar-admin broker-stats monitoring-metrics` - no 403 or permission errors should arise 4) Set up tenant, namespace 5) Push data into a topic - No errors in the proxy logs and client is able to push data into cluster through proxies This change should already covered by existing tests? #### Modules affected: The changes in the PR are affecting the deployment using the helm charts. Now the pulsar Proxy pods will be started only when at least one broker pod is Running and reachable. ### Documentation Currently there is no detailed order of pod startup documented anywhere and why. It would be good to document this.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
