neerajmangal opened a new issue #477: controller keeps restarting during startup due to livenessProbe failure. URL: https://github.com/apache/incubator-openwhisk-deploy-kube/issues/477 ## Description Controller restarts due to `livenessProbe`, if it takes more than 5 seconds to starts up. Currently, we are defining `initialDelaySeconds` to 5 seconds and if pod does not come up within that time, probe fails and kubelet restarts the pod. ## Logs ```bash kubectl describe pod owstage-controller-0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 3m54s default-scheduler Successfully assigned faask8s/owstage-controller-0 to <worker>.adobe.com Normal Pulling 3m53s kubelet, <worker>.adobe.com pulling image "<dockerregisrty>.adobe.com/busybox" Normal Pulled 3m52s kubelet, <worker>.adobe.com Successfully pulled image "<DockerRegisrty>.adobe.com/busybox" Normal Created 3m52s kubelet, <worker>.adobe.com Created container Normal Started 3m52s kubelet, <worker>.adobe.com Started container Normal Pulling 3m20s kubelet, <worker>.adobe.com pulling image "<DockerRegisrty>.adobe.com/busybox" Normal Started 3m19s kubelet, <worker>.adobe.com Started container Normal Pulled 3m19s kubelet, <worker>.adobe.com Successfully pulled image "<DockerRegisrty>.adobe.com/busybox" Normal Created 3m19s kubelet, <worker>.adobe.com Created container Normal Pulling 86s kubelet, <worker>.adobe.com pulling image "<DockerRegisrty>.adobe.com/openwhisk/controller:d353d26" Normal Pulled 86s kubelet, <worker>.adobe.com Successfully pulled image "<DockerRegisrty>.adobe.com/openwhisk/controller:d353d26" Normal Created 86s kubelet, <worker>.adobe.com Created container Warning DNSConfigForming 85s (x9 over 3m54s) kubelet, <worker>.adobe.com Search Line limits were exceeded, some search paths have been omitted, the applied search line is: faask8s.svc.cluster.local svc.cluster.local cluster.local ***.adobe.com ***.adobe.com ***.adobe.com Normal Started 85s kubelet, <worker>.adobe.com Started container Warning Unhealthy 52s (x3 over 72s) kubelet, <worker>.adobe.com Liveness probe failed: Get http://172.16.145.105:8080/ping: dial tcp 172.16.145.105:8080: connect: connection refused Normal Killing 46s kubelet, <worker>.adobe.com Killing container with id docker://controller:Container failed liveness probe.. Container will be killed and recreated. ``` Controller logs - *increased `initialDelaySeconds` to 15 to capture below logs* ```bash (base) Neerajs-MacBook-Pro:incubator-openwhisk-deploy-kube mangal$ kubectl logs owstage-controller-0 -f Picked up JAVA_TOOL_OPTIONS: -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap [2019-06-10T13:21:18.800Z] [INFO] Loaded metric reporter [kamon.statsd.StatsDReporter] [2019-06-10T13:21:18.800Z] [INFO] Started the Kamon StatsD reporter [2019-06-10T13:21:24.785Z] [INFO] Slf4jLogger started [2019-06-10T13:21:31.282Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.triggers.fires.perMinute [2019-06-10T13:21:31.283Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.actions.sequence.maxLength [2019-06-10T13:21:31.284Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.actions.invokes.concurrent [2019-06-10T13:21:31.285Z] [INFO] [#tid_sid_unknown] [Config] environment set value for controller.instances [2019-06-10T13:21:31.377Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.actions.invokes.perMinute [2019-06-10T13:21:31.378Z] [INFO] [#tid_sid_unknown] [Config] environment set value for runtimes.manifest [2019-06-10T13:21:31.380Z] [INFO] [#tid_sid_unknown] [Config] environment set value for kafka.hosts [2019-06-10T13:21:31.381Z] [INFO] [#tid_sid_unknown] [Config] environment set value for port [2019-06-10T13:21:34.777Z] [INFO] [#tid_sid_unknown] [Controller] Shutting down Kamon with coordinated shutdown ``` ## Proposed Solution - Add configurable `initialDelaySeconds` in probes, so if users find a similar issue they can modify it based on their requirements/environment. https://github.com/apache/incubator-openwhisk-deploy-kube/pull/471#issuecomment-497568781 ``` Probes: <component1>: livenessProbe: initialDelaySeconds: <#> ... other timing related configurations only if any. readinessProbe: initialDelaySeconds: <#> ... other timing related configurations only if any. <component2>: livenessProbe: initialDelaySeconds: <#> ... other timing related configurations only if any. readinessProbe: initialDelaySeconds: <#> ... other timing related configurations only if any. ``` - Add `readinessProbe` in conjunction with `livenessProbe` to start accepting traffic only if pod is considered ready. (Not related to this issue but I think it should be there).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
