neerajmangal opened a new issue #477: controller keeps restarting during 
startup due to livenessProbe failure.
URL: https://github.com/apache/incubator-openwhisk-deploy-kube/issues/477
 
 
   ## Description
   
   Controller restarts due to `livenessProbe`, if it takes more than 5 seconds 
to starts up. Currently, we are defining `initialDelaySeconds` to 5 seconds and 
if pod does not come up within that time, probe fails and kubelet restarts the 
pod. 
   
   ## Logs
   
   ```bash
   kubectl describe pod owstage-controller-0
   Events:
     Type     Reason            Age                  From                       
        Message
     ----     ------            ----                 ----                       
        -------
     Normal   Scheduled         3m54s                default-scheduler          
        Successfully assigned faask8s/owstage-controller-0 to <worker>.adobe.com
     Normal   Pulling           3m53s                kubelet, 
<worker>.adobe.com  pulling image "<dockerregisrty>.adobe.com/busybox"
     Normal   Pulled            3m52s                kubelet, 
<worker>.adobe.com  Successfully pulled image 
"<DockerRegisrty>.adobe.com/busybox"
     Normal   Created           3m52s                kubelet, 
<worker>.adobe.com  Created container
     Normal   Started           3m52s                kubelet, 
<worker>.adobe.com  Started container
     Normal   Pulling           3m20s                kubelet, 
<worker>.adobe.com  pulling image "<DockerRegisrty>.adobe.com/busybox"
     Normal   Started           3m19s                kubelet, 
<worker>.adobe.com  Started container
     Normal   Pulled            3m19s                kubelet, 
<worker>.adobe.com  Successfully pulled image 
"<DockerRegisrty>.adobe.com/busybox"
     Normal   Created           3m19s                kubelet, 
<worker>.adobe.com  Created container
     Normal   Pulling           86s                  kubelet, 
<worker>.adobe.com  pulling image 
"<DockerRegisrty>.adobe.com/openwhisk/controller:d353d26"
     Normal   Pulled            86s                  kubelet, 
<worker>.adobe.com  Successfully pulled image 
"<DockerRegisrty>.adobe.com/openwhisk/controller:d353d26"
     Normal   Created           86s                  kubelet, 
<worker>.adobe.com  Created container
     Warning  DNSConfigForming  85s (x9 over 3m54s)  kubelet, 
<worker>.adobe.com  Search Line limits were exceeded, some search paths have 
been omitted, the applied search line is: faask8s.svc.cluster.local 
svc.cluster.local cluster.local ***.adobe.com ***.adobe.com ***.adobe.com
     Normal   Started           85s                  kubelet, 
<worker>.adobe.com  Started container
     Warning  Unhealthy         52s (x3 over 72s)    kubelet, 
<worker>.adobe.com  Liveness probe failed: Get http://172.16.145.105:8080/ping: 
dial tcp 172.16.145.105:8080: connect: connection refused
     Normal   Killing           46s                  kubelet, 
<worker>.adobe.com  Killing container with id docker://controller:Container 
failed liveness probe.. Container will be killed and recreated.
   ```
   
   Controller logs - *increased `initialDelaySeconds` to 15 to capture below 
logs*
   
   ```bash 
   
   (base) Neerajs-MacBook-Pro:incubator-openwhisk-deploy-kube mangal$ kubectl 
logs owstage-controller-0 -f
   Picked up JAVA_TOOL_OPTIONS: -XX:+UnlockExperimentalVMOptions 
-XX:+UseCGroupMemoryLimitForHeap
   [2019-06-10T13:21:18.800Z] [INFO] Loaded metric reporter 
[kamon.statsd.StatsDReporter]
   [2019-06-10T13:21:18.800Z] [INFO] Started the Kamon StatsD reporter
   [2019-06-10T13:21:24.785Z] [INFO] Slf4jLogger started
   [2019-06-10T13:21:31.282Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for limits.triggers.fires.perMinute
   [2019-06-10T13:21:31.283Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for limits.actions.sequence.maxLength
   [2019-06-10T13:21:31.284Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for limits.actions.invokes.concurrent
   [2019-06-10T13:21:31.285Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for controller.instances
   [2019-06-10T13:21:31.377Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for limits.actions.invokes.perMinute
   [2019-06-10T13:21:31.378Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for runtimes.manifest
   [2019-06-10T13:21:31.380Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for kafka.hosts
   [2019-06-10T13:21:31.381Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for port
   [2019-06-10T13:21:34.777Z] [INFO] [#tid_sid_unknown] [Controller] Shutting 
down Kamon with coordinated shutdown
   
   ```
   ## Proposed Solution
   
   - Add configurable `initialDelaySeconds` in probes, so if users find a 
similar issue they can modify it based on their requirements/environment. 
https://github.com/apache/incubator-openwhisk-deploy-kube/pull/471#issuecomment-497568781
 
   
   ```
   Probes: 
      <component1>:
         livenessProbe:
           initialDelaySeconds: <#>
           ... other timing related configurations only if any. 
         readinessProbe:
           initialDelaySeconds: <#>
           ... other timing related configurations only if any. 
      <component2>:
         livenessProbe:
           initialDelaySeconds: <#>
           ... other timing related configurations only if any. 
         readinessProbe:
           initialDelaySeconds: <#>
           ... other timing related configurations only if any. 
   ```
   - Add `readinessProbe` in conjunction with `livenessProbe` to start 
accepting traffic only if pod is considered ready. (Not related to this issue 
but I think it should be there). 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to