Found that issue was happening when pods were getting launched on one of 
the node and on restarting docker on that node fixed my issue. Do we really 
need to restart docker after some time? How are these issues handled at 
production clusters?

On Saturday, April 28, 2018 at 4:46:29 AM UTC+5:30, Vivek Kumar wrote:
> Hi All, 
> I am facing a weird issue with my pods. I am launching around 20 
> containers in my env and every time some random 3-4 pods out of them hang 
> with Init:0/1 status. On checking the status of pod, Init container shows 
> running status, which should terminate after task is finished, and app 
> container shows Waiting/Pod Initializing stage. Same init container image 
> and specs are being used in across all 20 pods but this issue is happening 
> with some random pods every time. And on terminating these stuck pods, it 
> stucks in Terminating state. If i ssh on node at which this pod is launched 
> and run docker ps, it shows me init container in running state but on 
> running docker exec it throws error that container doesn't exist. This init 
> container is pulling configs from Consul Server and on checking volume (got 
> from docker inspect), i found that it has pulled all the key-val pairs 
> correctly and saved it in defined file name. I have checked resources on 
> all the nodes and more than enough is available on all. 
> Below is detailed example of on the pod acting like this. 
> kubectl version 
> Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", 
> GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", 
> BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", 
> Platform:"linux/amd64"} 
> Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", 
> GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", 
> BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", 
> Platform:"linux/amd64"} 
> kubectl get pods -n dev1|grep -i session-service 
> session-service-app-75c9c8b5d9-dsmhp               0/1       Init:0/1     
>       0          10h 
> session-service-app-75c9c8b5d9-vq98k               0/1       Terminating   
>      0          11h 
> kubectl describe pods session-service-app-75c9c8b5d9-dsmhp -n dev1 
> Name:           session-service-app-75c9c8b5d9-dsmhp 
> Namespace:      dev1 
> Node:           ip-192-168-44-18.ap-southeast-1.compute.internal/
> Start Time:     Fri, 27 Apr 2018 18:14:43 +0530 
> Labels:         app=session-service-app 
>                 pod-template-hash=3175746185 
>                 release=session-service-app 
> Status:         Pending 
> IP:    
> Controlled By:  ReplicaSet/session-service-app-75c9c8b5d9 
> Init Containers: 
>   initpullconsulconfig: 
>     Container ID: 
>  docker://c658d59995636e39c9d03b06e4973b6e32f818783a21ad292a2cf20d0e43bb02 
>     Image: 
>     Image ID:      docker-pullable://
>     Port:          <none> 
>     Args: 
>       -consul-addr=consul-server.consul.svc.cluster.local:8500 
>     State:          Running 
>       Started:      Fri, 27 Apr 2018 18:14:44 +0530 
>     Ready:          False 
>     Restart Count:  0 
>     Environment: 
>       POD:                      sand 
>       SERVICE:                  session-service-app 
>       ENV:                      dev1 
>     Mounts: 
>       /var/lib/app from shared-volume-sidecar (rw) 
>       /var/run/secrets/ from 
> default-token-bthkv (ro) 
> Containers: 
>   session-service-app: 
>     Container ID: 
>     Image:          
>     Image ID: 
>     Port:           8080/TCP 
>     State:          Waiting 
>       Reason:       PodInitializing 
>     Ready:          False 
>     Restart Count:  0 
>     Environment:    <none> 
>     Mounts: 
>       /etc/appenv from shared-volume-sidecar (rw) 
>       /var/run/secrets/ from 
> default-token-bthkv (ro) 
> Conditions: 
>   Type           Status 
>   Initialized    False 
>   Ready          False 
>   PodScheduled   True 
> Volumes: 
>   shared-volume-sidecar: 
>     Type:    EmptyDir (a temporary directory that shares a pod's lifetime) 
>     Medium: 
>   default-token-bthkv: 
>     Type:        Secret (a volume populated by a Secret) 
>     SecretName:  default-token-bthkv 
>     Optional:    false 
> QoS Class:       BestEffort 
> Node-Selectors:  <none> 
> Tolerations: for 300s 
>         for 300s 
> Events:          <none> 
> sudo docker ps|grep -i session 
> c658d5999563        
>                                       "/usr/bin/consul-t..."   10 hours ago 
>        Up 10 hours                             
> k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0
> da120abd3dbb             
>                              "/pause"                 10 hours ago       
>  Up 10 hours                             
> k8s_POD_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0
> f53d48c7d6ec        
>                                       "/usr/bin/consul-t..."   10 hours ago 
>        Up 10 hours                             
> k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0
> c26415458d39             
>                              "/pause"                 10 hours ago       
>  Up 10 hours                             
> k8s_POD_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0
> sudo docker exec -it c658d5999563 bash 
> rpc error: code = 2 desc = containerd: container not found 

