Hello all, I'm having an unusual problem with running Kubernetes on a cluster of four Raspberry Pi 3s: all outgoing networking connections from inside pods are failing. My hunch is that the cause of the problem is something related to the overlay network (I'm using Flannel) but I am really not sure. All of the relevant details I can think of follow. If anyone has an idea what the problem might be or how I can debug it further, I'd be grateful!
The cluster is running on four brand new Raspberry Pi 3 Model B machines connected to my home network using Ethernet. Network requests work as expected from the host machines. The servers are all flashed with Hypriot OS v1.4.0 (https://github.com/hypriot/image-builder-rpi/releases/tag/v1.4.0) with Docker manually downgraded to v1.12.6, which is known to work with Kubernetes 1.6. Kubernetes is the only thing installed on these servers. Kubernetes 1.6.1 is installed with kubeadm 1.6.1 following the getting started guide exactly (https://kubernetes.io/docs/getting-started-guides/kubeadm/). Specifically, the kubeadm command I start with is: `kubeadm init --apiserver-cert-extra-sans example.com --pod-network-cidr 10.244.0.0/16` (where example.com is public DNS record for my home network.) RBAC roles are created for Flannel with `kubectl apply -f flannel-rbac.yml` where the contents of the file are: --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: flannel rules: - apiGroups: - "" resources: - pods verbs: - get - apiGroups: - "" resources: - nodes verbs: - list - update - watch --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: flannel roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: flannel subjects: - kind: ServiceAccount name: flannel namespace: kube-system Flannel is deployed with `kubectl apply -f flannel.yml` where the contents of the file are: --- apiVersion: v1 kind: ServiceAccount metadata: name: flannel namespace: kube-system --- kind: ConfigMap apiVersion: v1 metadata: name: kube-flannel-cfg namespace: kube-system labels: tier: node app: flannel data: cni-conf.json: | { "name": "cbr0", "type": "flannel", "delegate": { "isDefaultGateway": true } } net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "vxlan" } } --- apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: kube-flannel-ds namespace: kube-system labels: tier: node app: flannel spec: template: metadata: labels: tier: node app: flannel spec: hostNetwork: true nodeSelector: beta.kubernetes.io/arch: arm tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule serviceAccountName: flannel containers: - name: kube-flannel image: quay.io/coreos/flannel:v0.7.0-arm command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ] securityContext: privileged: true env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: run mountPath: /run - name: flannel-cfg mountPath: /etc/kube-flannel/ - name: install-cni image: quay.io/coreos/flannel:v0.7.0-arm command: [ "/bin/sh", "-c", "set -e -x; cp -f /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do sleep 3600; done" ] volumeMounts: - name: cni mountPath: /etc/cni/net.d - name: flannel-cfg mountPath: /etc/kube-flannel/ volumes: - name: run hostPath: path: /run - name: cni hostPath: path: /etc/cni/net.d - name: flannel-cfg configMap: name: kube-flannel-cfg All Kubernetes nodes are online (kube-01 is the master): $ kubectl get nodes NAME STATUS AGE VERSION kube-01 Ready 1d v1.6.1 kube-02 Ready 1d v1.6.1 kube-03 Ready 1d v1.6.1 kube-04 Ready 1d v1.6.1 Here are the details of the kube-02 node, just as an example to show the node details: $ kubectl describe node kube-02 Name: kube-02 Role: Labels: beta.kubernetes.io/arch=arm beta.kubernetes.io/os=linux ingress-controller=traefik kubernetes.io/hostname=kube-02 Annotations: flannel.alpha.coreos.com/backend-data={"VtepMAC":"7a:ce:5a:3b:78:80"} flannel.alpha.coreos.com/backend-type=vxlan flannel.alpha.coreos.com/kube-subnet-manager=true flannel.alpha.coreos.com/public-ip=10.0.1.102 node.alpha.kubernetes.io/ttl=0 volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Mon, 03 Apr 2017 22:46:36 -0700 Phase: Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk False Wed, 05 Apr 2017 02:35:43 -0700 Mon, 03 Apr 2017 22:46:36 -0700 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Wed, 05 Apr 2017 02:35:43 -0700 Mon, 03 Apr 2017 22:46:36 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Wed, 05 Apr 2017 02:35:43 -0700 Mon, 03 Apr 2017 22:46:36 -0700 KubeletHasNoDiskPressure kubelet has no disk pressure Ready True Wed, 05 Apr 2017 02:35:43 -0700 Mon, 03 Apr 2017 22:47:38 -0700 KubeletReady kubelet is posting ready status Addresses: 10.0.1.102,10.0.1.102,kube-02 Capacity: cpu: 4 memory: 882632Ki pods: 110 Allocatable: cpu: 4 memory: 780232Ki pods: 110 System Info: Machine ID: 9989a26f06984d6dbadc01770f018e3b System UUID: 9989a26f06984d6dbadc01770f018e3b Boot ID: 4a400ae5-aaee-4c25-9125-4e0df445e064 Kernel Version: 4.4.50-hypriotos-v7+ OS Image: Raspbian GNU/Linux 8 (jessie) Operating System: linux Architecture: arm Container Runtime Version: docker://1.12.6 Kubelet Version: v1.6.1 Kube-Proxy Version: v1.6.1 PodCIDR: 10.244.1.0/24 ExternalID: kube-02 Non-terminated Pods: (2 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- kube-system kube-flannel-ds-p5l6q 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-proxy-z9dpz 0 (0%) 0 (0%) 0 (0%) 0 (0%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 0 (0%) 0 (0%) 0 (0%) 0 (0%) Events: <none> All pods, including kube-dns, are running as expected: $ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system etcd-kube-01 1/1 Running 0 1d kube-system kube-apiserver-kube-01 1/1 Running 0 1d kube-system kube-controller-manager-kube-01 1/1 Running 0 1d kube-system kube-dns-279829092-wf67d 3/3 Running 0 1d kube-system kube-flannel-ds-g3dwn 2/2 Running 0 1d kube-system kube-flannel-ds-p5l6q 2/2 Running 2 1d kube-system kube-flannel-ds-sk2ln 2/2 Running 0 1d kube-system kube-flannel-ds-x5t2h 2/2 Running 3 1d kube-system kube-proxy-3c8s6 1/1 Running 0 1d kube-system kube-proxy-kh0fh 1/1 Running 0 1d kube-system kube-proxy-pgcz6 1/1 Running 0 1d kube-system kube-proxy-z9dpz 1/1 Running 0 1d kube-system kube-scheduler-kube-01 1/1 Running 0 1d Services for the API server and DNS exist, as expected: $ kubectl get svc --all-namespaces NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE default kubernetes 10.96.0.1 <none> 443/TCP 1d kube-system kube-dns 10.96.0.10 <none> 53/UDP,53/TCP 1d And endpoints for those services exist, as expected: $ kubectl get endpoints --all-namespaces NAMESPACE NAME ENDPOINTS AGE default kubernetes 10.0.1.101:6443 1d kube-system kube-controller-manager <none> 1d kube-system kube-dns 10.244.0.2:53,10.244.0.2:53 1d kube-system kube-scheduler <none> 1d Note that the API server is running on the host network, as this is how kubeadm sets up its static pod, while kube-dns is running on the overlay network. Initially, I tried deploying a few other applications, including the Kubernetes Dashboard, and Traefik (used as an ingress controller) but produced errors in their logs about not being able to contact the API server, which was my first clues that something was wrong. Eventually, I reduced the problem to the following failing test case. The Docker image is https://hub.docker.com/r/jimmycuadra/rpi-debug/, which is just an ARM build of Alpine Linux with `dig` and `curl` installed in addition to the stock `nslookup`. $ kubectl run debug --image jimmycuadra/rpi-debug --generator run-pod/v1 -o yaml --save-config --rm -it /bin/ash If you don't see a command prompt, try pressing enter. / # ifconfig eth0 Link encap:Ethernet HWaddr 0A:58:0A:F4:02:05 inet addr:10.244.2.5 Bcast:0.0.0.0 Mask:255.255.255.0 inet6 addr: fe80::c49f:43ff:fece:b3c3/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:18 errors:0 dropped:0 overruns:0 frame:0 TX packets:7 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3323 (3.2 KiB) TX bytes:578 (578.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) / # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.244.3.1 0.0.0.0 UG 0 0 0 eth0 10.244.0.0 10.244.3.1 255.255.0.0 UG 0 0 0 eth0 10.244.3.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 / # cat /etc/resolv.conf nameserver 10.96.0.10 search default.svc.cluster.local svc.cluster.local cluster.local webpass.net options ndots:5 / # cat /etc/hosts # Kubernetes-managed hosts file. 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet fe00::0 ip6-mcastprefix fe00::1 ip6-allnodes fe00::2 ip6-allrouters 10.244.2.4 debug / # nslookup kubernetes ;; connection timed out; no servers could be reached / # nslookup kubernetes.default.svc.cluster.local ;; connection timed out; no servers could be reached / # nslookup google.com ;; connection timed out; no servers could be reached / # curl -i --connect-timeout 15 -H "Host: www.google.com" https://216.58.192.14/ curl: (28) Connection timed out after 15001 milliseconds / # curl -i --connect-timeout 15 -H "Host: kubernetes" https://10.0.1.101:6443/ curl: (28) Connection timed out after 15001 milliseconds / # apk update fetch http://nl.alpinelinux.org/alpine/edge/main/armhf/APKINDEX.tar.gz ERROR: http://nl.alpinelinux.org/alpine/edge/main: temporary error (try again later) v3.5.0-3172-gb55f907b71 [http://nl.alpinelinux.org/alpine/edge/main] 1 errors; 5526 distinct packages available As you can see from the above session, the kube-dns DNS server is in /etc/resolv.conf as expected (10.96.0.10), but nslookup fails for the kubernetes name, both relative and fully qualified, as does nslookup on google.com. I also tried using the IP of Google and of the Kubernetes node running the API server manually, but no outgoing connections work. Even Alpine Linux's package manager, apk, cannot make an outgoing connection. Trying the same steps using the "Default" DNS policy for the pod reveals that DNS resolution and outgoing connections to the Internet still fail: $ kubectl run debug --image jimmycuadra/rpi-debug --generator run-pod/v1 -o yaml --overrides '{"spec":{"dnsPolicy":"Default"}}' --save-config --rm -it /bin/ash If you don't see a command prompt, try pressing enter. / # ifconfig eth0 Link encap:Ethernet HWaddr 0A:58:0A:F4:01:05 inet addr:10.244.1.5 Bcast:0.0.0.0 Mask:255.255.255.0 inet6 addr: fe80::34fc:c5ff:fef6:134/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:18 errors:0 dropped:0 overruns:0 frame:0 TX packets:7 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3323 (3.2 KiB) TX bytes:578 (578.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) / # route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 10.244.1.1 0.0.0.0 UG 0 0 0 eth0 10.244.0.0 10.244.1.1 255.255.0.0 UG 0 0 0 eth0 10.244.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 / # cat /etc/resolv.conf nameserver 10.0.1.1 search webpass.net / # cat /etc/hosts # Kubernetes-managed hosts file. 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet fe00::0 ip6-mcastprefix fe00::1 ip6-allnodes fe00::2 ip6-allrouters 10.244.3.6 debug / # nslookup google.com ;; connection timed out; no servers could be reached / # curl -i --connect-timeout 15 -H "Host: www.google.com" https://216.58.192.14/ curl: (28) Connection timed out after 15000 milliseconds / # apk update fetch http://nl.alpinelinux.org/alpine/edge/main/armhf/APKINDEX.tar.gz ERROR: http://nl.alpinelinux.org/alpine/edge/main: temporary error (try again later) v3.5.0-3172-gb55f907b71 [http://nl.alpinelinux.org/alpine/edge/main] 1 errors; 5526 distinct packages available You can see that Flannel is operating, because this debug pod is given an IP within the pod network's CIDR (as kube-dns was): $ kubectl describe pod debug Name: debug Namespace: default Node: kube-03/10.0.1.103 Start Time: Wed, 05 Apr 2017 02:51:46 -0700 Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration={"kind":"Pod","apiVersion":"v1","metadata":{"name":"debug","creationTimestamp":null},"spec":{"containers":[{"name":"debug","image":"jimmycuadra/rpi-deb... Status: Running IP: 10.244.3.6 Controllers: <none> Containers: debug: Container ID: docker://8c24be5df5b1f526b901b912c654b63705122b64c194a9556d8453573755c752 Image: jimmycuadra/rpi-debug Image ID: docker-pullable://jimmycuadra/rpi-debug@sha256:144cb3c504e691882034340890d58eac6ac7c11af482a645623c1cb33271ca5e Port: Args: /bin/ash State: Running Started: Wed, 05 Apr 2017 02:51:50 -0700 Ready: True Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-09gfc (ro) Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: default-token-09gfc: Type: Secret (a volume populated by a Secret) SecretName: default-token-09gfc Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 default-scheduler Normal Scheduled Successfully assigned debug to kube-03 2m 2m 1 kubelet, kube-03 spec.containers{debug} Normal Pulled Container image "jimmycuadra/rpi-debug" already present on machine 2m 2m 1 kubelet, kube-03 spec.containers{debug} Normal Created Created container with id 8c24be5df5b1f526b901b912c654b63705122b64c194a9556d8453573755c752 2m 2m 1 kubelet, kube-03 spec.containers{debug} Normal Started Started container with id 8c24be5df5b1f526b901b912c654b63705122b64c194a9556d8453573755c752 Here is the beginning of the logs for kube-dns: $ kubectl logs kube-dns-279829092-wf67d -c kubedns -n kube-system I0404 05:46:45.782718 1 dns.go:49] version: v1.5.2-beta.0+$Format:%h$ I0404 05:46:45.793351 1 server.go:70] Using configuration read from directory: /kube-dns-config%!(EXTRA time.Duration=10s) I0404 05:46:45.793794 1 server.go:112] FLAG: --alsologtostderr="false" I0404 05:46:45.793942 1 server.go:112] FLAG: --config-dir="/kube-dns-config" I0404 05:46:45.794033 1 server.go:112] FLAG: --config-map="" I0404 05:46:45.794093 1 server.go:112] FLAG: --config-map-namespace="kube-system" I0404 05:46:45.794159 1 server.go:112] FLAG: --config-period="10s" I0404 05:46:45.794247 1 server.go:112] FLAG: --dns-bind-address="0.0.0.0" I0404 05:46:45.794311 1 server.go:112] FLAG: --dns-port="10053" I0404 05:46:45.794427 1 server.go:112] FLAG: --domain="cluster.local." I0404 05:46:45.794509 1 server.go:112] FLAG: --federations="" I0404 05:46:45.794582 1 server.go:112] FLAG: --healthz-port="8081" I0404 05:46:45.794647 1 server.go:112] FLAG: --initial-sync-timeout="1m0s" I0404 05:46:45.794722 1 server.go:112] FLAG: --kube-master-url="" I0404 05:46:45.794795 1 server.go:112] FLAG: --kubecfg-file="" I0404 05:46:45.794853 1 server.go:112] FLAG: --log-backtrace-at=":0" I0404 05:46:45.794933 1 server.go:112] FLAG: --log-dir="" I0404 05:46:45.795003 1 server.go:112] FLAG: --log-flush-frequency="5s" I0404 05:46:45.795073 1 server.go:112] FLAG: --logtostderr="true" I0404 05:46:45.795144 1 server.go:112] FLAG: --nameservers="" I0404 05:46:45.795202 1 server.go:112] FLAG: --stderrthreshold="2" I0404 05:46:45.795264 1 server.go:112] FLAG: --v="2" I0404 05:46:45.795324 1 server.go:112] FLAG: --version="false" I0404 05:46:45.795407 1 server.go:112] FLAG: --vmodule="" I0404 05:46:45.795793 1 server.go:175] Starting SkyDNS server (0.0.0.0:10053) I0404 05:46:45.800841 1 server.go:197] Skydns metrics enabled (/metrics:10055) I0404 05:46:45.800982 1 dns.go:147] Starting endpointsController I0404 05:46:45.801050 1 dns.go:150] Starting serviceController I0404 05:46:45.802186 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0] I0404 05:46:45.802431 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0] I0404 05:46:46.194772 1 dns.go:264] New service: kubernetes I0404 05:46:46.199497 1 dns.go:462] Added SRV record &{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I0404 05:46:46.201053 1 dns.go:264] New service: kube-dns I0404 05:46:46.201745 1 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I0404 05:46:46.202287 1 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I0404 05:46:46.302608 1 dns.go:171] Initialized services and endpoints from apiserver I0404 05:46:46.302733 1 server.go:128] Setting up Healthz Handler (/readiness) I0404 05:46:46.302843 1 server.go:133] Setting up cache handler (/cache) I0404 05:46:46.302935 1 server.go:119] Status HTTP port 8081 I0404 05:51:45.802627 1 dns.go:264] New service: kubernetes I0404 05:51:45.803656 1 dns.go:462] Added SRV record &{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I0404 05:51:45.804266 1 dns.go:264] New service: kube-dns I0404 05:51:45.804771 1 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I0404 05:51:45.805283 1 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I0404 05:54:12.745272 1 dns.go:264] New service: kubernetes-dashboard I0404 05:56:45.805684 1 dns.go:264] New service: kubernetes I0404 05:56:45.809947 1 dns.go:462] Added SRV record &{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I0404 05:56:45.811538 1 dns.go:264] New service: kube-dns I0404 05:56:45.812488 1 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I0404 05:56:45.813454 1 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I0404 05:56:45.814443 1 dns.go:264] New service: kubernetes-dashboard I0404 06:01:45.806051 1 dns.go:264] New service: kube-dns I0404 06:01:45.806895 1 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I0404 06:01:45.807408 1 dns.go:462] Added SRV record &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I0404 06:01:45.807884 1 dns.go:264] New service: kubernetes-dashboard I0404 06:01:45.808341 1 dns.go:264] New service: kubernetes I0404 06:01:45.808752 1 dns.go:462] Added SRV record &{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} I don't see any errors in any of it, just an endless stream of it finding "kubernetes" and "kube-dns" as "new services" and adding SRV records for them. Here are the logs for Flannel on a node where the Flannel pod never restarted: $ kubectl logs kube-flannel-ds-g3dwn -c kube-flannel -n kube-system I0404 05:46:05.193078 1 kube.go:109] Waiting 10m0s for node controller to sync I0404 05:46:05.193340 1 kube.go:289] starting kube subnet manager I0404 05:46:06.194279 1 kube.go:116] Node controller sync successful I0404 05:46:06.194463 1 main.go:132] Installing signal handlers I0404 05:46:06.196013 1 manager.go:136] Determining IP address of default interface I0404 05:46:06.199502 1 manager.go:149] Using interface with name eth0 and address 10.0.1.101 I0404 05:46:06.199681 1 manager.go:166] Defaulting external address to interface address (10.0.1.101) I0404 05:46:06.631802 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN I0404 05:46:06.665265 1 ipmasq.go:47] Adding iptables rule: -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE I0404 05:46:06.700650 1 ipmasq.go:47] Adding iptables rule: ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE I0404 05:46:06.720807 1 manager.go:250] Lease acquired: 10.244.0.0/24 I0404 05:46:06.722263 1 network.go:58] Watching for L3 misses I0404 05:46:06.722473 1 network.go:66] Watching for new subnet leases I0405 04:46:06.678418 1 network.go:160] Lease renewed, new expiration: 2017-04-06 04:46:06.652848051 +0000 UTC Here are logs from a failed Flannel pod on one of the nodes where it's restarted a few times: $ kubectl logs kube-flannel-ds-x5t2h -c kube-flannel -n kube-system -p E0404 05:50:02.782218 1 main.go:127] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-x5t2h': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-x5t2h: dial tcp 10.96.0.1:443: i/o timeout Here are the iptables rules that appear identically on all four servers: $ sudo iptables-save # Generated by iptables-save v1.4.21 on Wed Apr 5 10:01:19 2017 *nat :PREROUTING ACCEPT [3:372] :INPUT ACCEPT [3:372] :OUTPUT ACCEPT [26:1659] :POSTROUTING ACCEPT [26:1659] :DOCKER - [0:0] :KUBE-MARK-DROP - [0:0] :KUBE-MARK-MASQ - [0:0] :KUBE-NODEPORTS - [0:0] :KUBE-POSTROUTING - [0:0] :KUBE-SEP-HHOMLR7ARJQ6WUFK - [0:0] :KUBE-SEP-IT2ZTR26TO4XFPTO - [0:0] :KUBE-SEP-YIL6JZP7A3QYXJU2 - [0:0] :KUBE-SERVICES - [0:0] :KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0] :KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0] :KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0] -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER -A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE -A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN -A POSTROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE -A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE -A DOCKER -i docker0 -j RETURN -A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000 -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000 -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE -A KUBE-SEP-HHOMLR7ARJQ6WUFK -s 10.0.1.101/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ -A KUBE-SEP-HHOMLR7ARJQ6WUFK -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-HHOMLR7ARJQ6WUFK --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 10.0.1.101:6443 -A KUBE-SEP-IT2ZTR26TO4XFPTO -s 10.244.0.2/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ -A KUBE-SEP-IT2ZTR26TO4XFPTO -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.2:53 -A KUBE-SEP-YIL6JZP7A3QYXJU2 -s 10.244.0.2/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ -A KUBE-SEP-YIL6JZP7A3QYXJU2 -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.2:53 -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ -A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ -A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4 -A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS -A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-IT2ZTR26TO4XFPTO -A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-HHOMLR7ARJQ6WUFK --mask 255.255.255.255 --rsource -j KUBE-SEP-HHOMLR7ARJQ6WUFK -A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-HHOMLR7ARJQ6WUFK -A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-YIL6JZP7A3QYXJU2 COMMIT # Completed on Wed Apr 5 10:01:19 2017 # Generated by iptables-save v1.4.21 on Wed Apr 5 10:01:19 2017 *filter :INPUT ACCEPT [1943:614999] :FORWARD DROP [0:0] :OUTPUT ACCEPT [1949:861554] :DOCKER - [0:0] :DOCKER-ISOLATION - [0:0] :KUBE-FIREWALL - [0:0] :KUBE-SERVICES - [0:0] -A INPUT -j KUBE-FIREWALL -A FORWARD -j DOCKER-ISOLATION -A FORWARD -o docker0 -j DOCKER -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -i docker0 ! -o docker0 -j ACCEPT -A FORWARD -i docker0 -o docker0 -j ACCEPT -A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A OUTPUT -j KUBE-FIREWALL -A DOCKER-ISOLATION -j RETURN -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP COMMIT # Completed on Wed Apr 5 10:01:19 2017 Here is the output of ifconfig on the server running the Kubernetes master components (kube-01): $ ifconfig cni0 Link encap:Ethernet HWaddr 0a:58:0a:f4:00:01 inet addr:10.244.0.1 Bcast:0.0.0.0 Mask:255.255.255.0 inet6 addr: fe80::807b:3bff:fedf:ff7d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:322236 errors:0 dropped:0 overruns:0 frame:0 TX packets:331776 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:74133093 (70.6 MiB) TX bytes:73272040 (69.8 MiB) docker0 Link encap:Ethernet HWaddr 02:42:43:63:54:be inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) eth0 Link encap:Ethernet HWaddr b8:27:eb:fa:0d:18 inet addr:10.0.1.101 Bcast:10.0.1.255 Mask:255.255.255.0 inet6 addr: fe80::ba27:ebff:fefa:d18/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1594829 errors:0 dropped:0 overruns:0 frame:0 TX packets:1234243 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:482745836 (460.3 MiB) TX bytes:943355891 (899.6 MiB) flannel.1 Link encap:Ethernet HWaddr 7a:54:f6:da:6b:a0 inet addr:10.244.0.0 Bcast:0.0.0.0 Mask:255.255.255.255 inet6 addr: fe80::7854:f6ff:feda:6ba0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:3 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:38 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:204 (204.0 B) TX bytes:0 (0.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:5523912 errors:0 dropped:0 overruns:0 frame:0 TX packets:5523912 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:2042135076 (1.9 GiB) TX bytes:2042135076 (1.9 GiB) vethbe064275 Link encap:Ethernet HWaddr 1e:4f:ea:70:9f:e1 inet6 addr: fe80::1c4f:eaff:fe70:9fe1/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:322237 errors:0 dropped:0 overruns:0 frame:0 TX packets:331794 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:78644487 (75.0 MiB) TX bytes:73275343 (69.8 MiB) And here it is on the worker node kube-02: $ ifconfig cni0 Link encap:Ethernet HWaddr 0a:58:0a:f4:01:01 inet addr:10.244.1.1 Bcast:0.0.0.0 Mask:255.255.255.0 inet6 addr: fe80::383a:41ff:fea4:f113/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:125 errors:0 dropped:0 overruns:0 frame:0 TX packets:51 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:7794 (7.6 KiB) TX bytes:7391 (7.2 KiB) docker0 Link encap:Ethernet HWaddr 02:42:ad:1b:1e:a3 inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) eth0 Link encap:Ethernet HWaddr b8:27:eb:bb:ff:69 inet addr:10.0.1.102 Bcast:10.0.1.255 Mask:255.255.255.0 inet6 addr: fe80::ba27:ebff:febb:ff69/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:750764 errors:0 dropped:0 overruns:0 frame:0 TX packets:442199 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:597869801 (570.1 MiB) TX bytes:42574858 (40.6 MiB) flannel.1 Link encap:Ethernet HWaddr 7a:ce:5a:3b:78:80 inet addr:10.244.1.0 Bcast:0.0.0.0 Mask:255.255.255.255 inet6 addr: fe80::78ce:5aff:fe3b:7880/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:38 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:4 errors:0 dropped:0 overruns:0 frame:0 TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:240 (240.0 B) TX bytes:240 (240.0 B) Again, if anyone has made it this far, please let me know if you have any ideas, or if there are other commands I can show the output of to help narrow it down! Thanks very much, Jimmy -- You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscr...@googlegroups.com. To post to this group, send email to kubernetes-users@googlegroups.com. Visit this group at https://groups.google.com/group/kubernetes-users. For more options, visit https://groups.google.com/d/optout.