Hi All, I am facing a weird issue with my pods. I am launching around 20 containers in my env and every time some random 3-4 pods out of them hang with Init:0/1 status. On checking the status of pod, Init container shows running status, which should terminate after task is finished, and app container shows Waiting/Pod Initializing stage. Same init container image and specs are being used in across all 20 pods but this issue is happening with some random pods every time. And on terminating these stuck pods, it stucks in Terminating state. If i ssh on node at which this pod is launched and run docker ps, it shows me init container in running state but on running docker exec it throws error that container doesn't exist. This init container is pulling configs from Consul Server and on checking volume (got from docker inspect), i found that it has pulled all the key-val pairs correctly and saved it in defined file name. I have checked resources on all the nodes and more than enough is available on all. Below is detailed example of on the pod acting like this.
kubectl version Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} kubectl get pods -n dev1|grep -i session-service session-service-app-75c9c8b5d9-dsmhp 0/1 Init:0/1 0 10h session-service-app-75c9c8b5d9-vq98k 0/1 Terminating 0 11h kubectl describe pods session-service-app-75c9c8b5d9-dsmhp -n dev1 Name: session-service-app-75c9c8b5d9-dsmhp Namespace: dev1 Node: ip-192-168-44-18.ap-southeast-1.compute.internal/192.168.44.18 Start Time: Fri, 27 Apr 2018 18:14:43 +0530 Labels: app=session-service-app pod-template-hash=3175746185 release=session-service-app Status: Pending IP: 100.96.4.240 Controlled By: ReplicaSet/session-service-app-75c9c8b5d9 Init Containers: initpullconsulconfig: Container ID: docker://c658d59995636e39c9d03b06e4973b6e32f818783a21ad292a2cf20d0e43bb02 Image: shr-u-nexus-01.myops.de:8082/utils/app-init:1.0 Image ID: docker-pullable://shr-u-nexus-01.myops.de:8082/utils/app-init@sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd Port: <none> Args: -consul-addr=consul-server.consul.svc.cluster.local:8500 State: Running Started: Fri, 27 Apr 2018 18:14:44 +0530 Ready: False Restart Count: 0 Environment: CONSUL_TEMPLATE_VERSION: 0.19.4 POD: sand SERVICE: session-service-app ENV: dev1 Mounts: /var/lib/app from shared-volume-sidecar (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro) Containers: session-service-app: Container ID: Image: shr-u-nexus-01.myops.de:8082/sand-images/sessionservice-init:sitv12 Image ID: Port: 8080/TCP State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Environment: <none> Mounts: /etc/appenv from shared-volume-sidecar (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro) Conditions: Type Status Initialized False Ready False PodScheduled True Volumes: shared-volume-sidecar: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: default-token-bthkv: Type: Secret (a volume populated by a Secret) SecretName: default-token-bthkv Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: <none> sudo docker ps|grep -i session c658d5999563 shr-u-nexus-01.myops.de:8082/utils/app-init@sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd "/usr/bin/consul-t..." 10 hours ago Up 10 hours k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0 da120abd3dbb gcr.io/google_containers/pause-amd64:3.0 "/pause" 10 hours ago Up 10 hours k8s_POD_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0 f53d48c7d6ec shr-u-nexus-01.myops.de:8082/utils/app-init@sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd "/usr/bin/consul-t..." 10 hours ago Up 10 hours k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0 c26415458d39 gcr.io/google_containers/pause-amd64:3.0 "/pause" 10 hours ago Up 10 hours k8s_POD_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0 sudo docker exec -it c658d5999563 bash rpc error: code = 2 desc = containerd: container not found -- IMPORTANT NOTICE: This e-mail, including any attachments, may contain confidential information and is intended only for the addressee(s) named above. If you are not the intended recipient(s), you should not disseminate, distribute, or copy this e-mail. Please notify the sender by reply e-mail immediately if you have received this e-mail in error and permanently delete all copies of the original message from your system. E-mail transmission cannot be guaranteed to be secure as it could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. Company accepts no liability for any damage or loss of confidential information caused by this email or due to any virus transmitted by this email or otherwise. -- You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscr...@googlegroups.com. To post to this group, send email to kubernetes-users@googlegroups.com. Visit this group at https://groups.google.com/group/kubernetes-users. For more options, visit https://groups.google.com/d/optout.