I see the flannel masquerade for inbound traffic (-A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE) but not for outbound (expect -A POSTROUTING -s 10.244.0.0/16 ! -d 10.244.0.0/16 -j MASQUERADE)
On Wed, Apr 5, 2017 at 3:16 AM, <jimmycua...@gmail.com> wrote: > Hello all, > > I'm having an unusual problem with running Kubernetes on a cluster of four > Raspberry Pi 3s: all outgoing networking connections from inside pods are > failing. My hunch is that the cause of the problem is something related to > the overlay network (I'm using Flannel) but I am really not sure. All of the > relevant details I can think of follow. If anyone has an idea what the > problem might be or how I can debug it further, I'd be grateful! > > The cluster is running on four brand new Raspberry Pi 3 Model B machines > connected to my home network using Ethernet. Network requests work as > expected from the host machines. > > The servers are all flashed with Hypriot OS v1.4.0 > (https://github.com/hypriot/image-builder-rpi/releases/tag/v1.4.0) with > Docker manually downgraded to v1.12.6, which is known to work with Kubernetes > 1.6. Kubernetes is the only thing installed on these servers. > > Kubernetes 1.6.1 is installed with kubeadm 1.6.1 following the getting > started guide exactly > (https://kubernetes.io/docs/getting-started-guides/kubeadm/). Specifically, > the kubeadm command I start with is: `kubeadm init > --apiserver-cert-extra-sans example.com --pod-network-cidr 10.244.0.0/16` > (where example.com is public DNS record for my home network.) > > RBAC roles are created for Flannel with `kubectl apply -f flannel-rbac.yml` > where the contents of the file are: > > --- > kind: ClusterRole > apiVersion: rbac.authorization.k8s.io/v1beta1 > metadata: > name: flannel > rules: > - apiGroups: > - "" > resources: > - pods > verbs: > - get > - apiGroups: > - "" > resources: > - nodes > verbs: > - list > - update > - watch > --- > kind: ClusterRoleBinding > apiVersion: rbac.authorization.k8s.io/v1beta1 > metadata: > name: flannel > roleRef: > apiGroup: rbac.authorization.k8s.io > kind: ClusterRole > name: flannel > subjects: > - kind: ServiceAccount > name: flannel > namespace: kube-system > > Flannel is deployed with `kubectl apply -f flannel.yml` where the contents of > the file are: > > --- > apiVersion: v1 > kind: ServiceAccount > metadata: > name: flannel > namespace: kube-system > --- > kind: ConfigMap > apiVersion: v1 > metadata: > name: kube-flannel-cfg > namespace: kube-system > labels: > tier: node > app: flannel > data: > cni-conf.json: | > { > "name": "cbr0", > "type": "flannel", > "delegate": { > "isDefaultGateway": true > } > } > net-conf.json: | > { > "Network": "10.244.0.0/16", > "Backend": { > "Type": "vxlan" > } > } > --- > apiVersion: extensions/v1beta1 > kind: DaemonSet > metadata: > name: kube-flannel-ds > namespace: kube-system > labels: > tier: node > app: flannel > spec: > template: > metadata: > labels: > tier: node > app: flannel > spec: > hostNetwork: true > nodeSelector: > beta.kubernetes.io/arch: arm > tolerations: > - key: node-role.kubernetes.io/master > effect: NoSchedule > serviceAccountName: flannel > containers: > - name: kube-flannel > image: quay.io/coreos/flannel:v0.7.0-arm > command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ] > securityContext: > privileged: true > env: > - name: POD_NAME > valueFrom: > fieldRef: > fieldPath: metadata.name > - name: POD_NAMESPACE > valueFrom: > fieldRef: > fieldPath: metadata.namespace > volumeMounts: > - name: run > mountPath: /run > - name: flannel-cfg > mountPath: /etc/kube-flannel/ > - name: install-cni > image: quay.io/coreos/flannel:v0.7.0-arm > command: [ "/bin/sh", "-c", "set -e -x; cp -f > /etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; > do sleep 3600; done" ] > volumeMounts: > - name: cni > mountPath: /etc/cni/net.d > - name: flannel-cfg > mountPath: /etc/kube-flannel/ > volumes: > - name: run > hostPath: > path: /run > - name: cni > hostPath: > path: /etc/cni/net.d > - name: flannel-cfg > configMap: > name: kube-flannel-cfg > > All Kubernetes nodes are online (kube-01 is the master): > > $ kubectl get nodes > NAME STATUS AGE VERSION > kube-01 Ready 1d v1.6.1 > kube-02 Ready 1d v1.6.1 > kube-03 Ready 1d v1.6.1 > kube-04 Ready 1d v1.6.1 > > Here are the details of the kube-02 node, just as an example to show the node > details: > > $ kubectl describe node kube-02 > Name: kube-02 > Role: > Labels: beta.kubernetes.io/arch=arm > beta.kubernetes.io/os=linux > ingress-controller=traefik > kubernetes.io/hostname=kube-02 > Annotations: > flannel.alpha.coreos.com/backend-data={"VtepMAC":"7a:ce:5a:3b:78:80"} > flannel.alpha.coreos.com/backend-type=vxlan > flannel.alpha.coreos.com/kube-subnet-manager=true > flannel.alpha.coreos.com/public-ip=10.0.1.102 > node.alpha.kubernetes.io/ttl=0 > volumes.kubernetes.io/controller-managed-attach-detach=true > Taints: <none> > CreationTimestamp: Mon, 03 Apr 2017 22:46:36 -0700 > Phase: > Conditions: > Type Status LastHeartbeatTime > LastTransitionTime Reason > Message > ---- ------ ----------------- > ------------------ ------ > ------- > OutOfDisk False Wed, 05 Apr 2017 02:35:43 -0700 > Mon, 03 Apr 2017 22:46:36 -0700 KubeletHasSufficientDisk > kubelet has sufficient disk space available > MemoryPressure False Wed, 05 Apr 2017 02:35:43 -0700 Mon, > 03 Apr 2017 22:46:36 -0700 KubeletHasSufficientMemory kubelet > has sufficient memory available > DiskPressure False Wed, 05 Apr 2017 02:35:43 -0700 > Mon, 03 Apr 2017 22:46:36 -0700 KubeletHasNoDiskPressure > kubelet has no disk pressure > Ready True Wed, 05 Apr 2017 02:35:43 -0700 Mon, > 03 Apr 2017 22:47:38 -0700 KubeletReady kubelet is > posting ready status > Addresses: 10.0.1.102,10.0.1.102,kube-02 > Capacity: > cpu: 4 > memory: 882632Ki > pods: 110 > Allocatable: > cpu: 4 > memory: 780232Ki > pods: 110 > System Info: > Machine ID: 9989a26f06984d6dbadc01770f018e3b > System UUID: 9989a26f06984d6dbadc01770f018e3b > Boot ID: 4a400ae5-aaee-4c25-9125-4e0df445e064 > Kernel Version: 4.4.50-hypriotos-v7+ > OS Image: Raspbian GNU/Linux 8 (jessie) > Operating System: linux > Architecture: arm > Container Runtime Version: docker://1.12.6 > Kubelet Version: v1.6.1 > Kube-Proxy Version: v1.6.1 > PodCIDR: 10.244.1.0/24 > ExternalID: kube-02 > Non-terminated Pods: (2 in total) > Namespace Name CPU Requests > CPU Limits Memory Requests Memory Limits > --------- ---- ------------ > ---------- --------------- ------------- > kube-system kube-flannel-ds-p5l6q 0 > (0%) 0 (0%) 0 (0%) 0 (0%) > kube-system kube-proxy-z9dpz 0 > (0%) 0 (0%) 0 (0%) 0 (0%) > Allocated resources: > (Total limits may be over 100 percent, i.e., overcommitted.) > CPU Requests CPU Limits Memory Requests Memory Limits > ------------ ---------- --------------- ------------- > 0 (0%) 0 (0%) 0 (0%) 0 (0%) > Events: <none> > > All pods, including kube-dns, are running as expected: > > $ kubectl get pods --all-namespaces > NAMESPACE NAME READY STATUS > RESTARTS AGE > kube-system etcd-kube-01 1/1 Running 0 > 1d > kube-system kube-apiserver-kube-01 1/1 Running 0 > 1d > kube-system kube-controller-manager-kube-01 1/1 Running 0 > 1d > kube-system kube-dns-279829092-wf67d 3/3 Running 0 > 1d > kube-system kube-flannel-ds-g3dwn 2/2 Running 0 > 1d > kube-system kube-flannel-ds-p5l6q 2/2 Running 2 > 1d > kube-system kube-flannel-ds-sk2ln 2/2 Running 0 > 1d > kube-system kube-flannel-ds-x5t2h 2/2 Running 3 > 1d > kube-system kube-proxy-3c8s6 1/1 Running 0 > 1d > kube-system kube-proxy-kh0fh 1/1 Running 0 > 1d > kube-system kube-proxy-pgcz6 1/1 Running 0 > 1d > kube-system kube-proxy-z9dpz 1/1 Running 0 > 1d > kube-system kube-scheduler-kube-01 1/1 Running 0 > 1d > > Services for the API server and DNS exist, as expected: > > $ kubectl get svc --all-namespaces > NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE > default kubernetes 10.96.0.1 <none> 443/TCP 1d > kube-system kube-dns 10.96.0.10 <none> 53/UDP,53/TCP 1d > > And endpoints for those services exist, as expected: > > $ kubectl get endpoints --all-namespaces > NAMESPACE NAME ENDPOINTS AGE > default kubernetes 10.0.1.101:6443 1d > kube-system kube-controller-manager <none> 1d > kube-system kube-dns 10.244.0.2:53,10.244.0.2:53 1d > kube-system kube-scheduler <none> 1d > > Note that the API server is running on the host network, as this is how > kubeadm sets up its static pod, while kube-dns is running on the overlay > network. > > Initially, I tried deploying a few other applications, including the > Kubernetes Dashboard, and Traefik (used as an ingress controller) but > produced errors in their logs about not being able to contact the API server, > which was my first clues that something was wrong. Eventually, I reduced the > problem to the following failing test case. The Docker image is > https://hub.docker.com/r/jimmycuadra/rpi-debug/, which is just an ARM build > of Alpine Linux with `dig` and `curl` installed in addition to the stock > `nslookup`. > > $ kubectl run debug --image jimmycuadra/rpi-debug --generator run-pod/v1 > -o yaml --save-config --rm -it /bin/ash > If you don't see a command prompt, try pressing enter. > / # ifconfig > eth0 Link encap:Ethernet HWaddr 0A:58:0A:F4:02:05 > inet addr:10.244.2.5 Bcast:0.0.0.0 Mask:255.255.255.0 > inet6 addr: fe80::c49f:43ff:fece:b3c3/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 > RX packets:18 errors:0 dropped:0 overruns:0 frame:0 > TX packets:7 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:3323 (3.2 KiB) TX bytes:578 (578.0 B) > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:65536 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1 > RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) > / # route -n > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref Use > Iface > 0.0.0.0 10.244.3.1 0.0.0.0 UG 0 0 0 > eth0 > 10.244.0.0 10.244.3.1 255.255.0.0 UG 0 0 0 > eth0 > 10.244.3.0 0.0.0.0 255.255.255.0 U 0 0 0 > eth0 > / # cat /etc/resolv.conf > nameserver 10.96.0.10 > search default.svc.cluster.local svc.cluster.local cluster.local > webpass.net > options ndots:5 > > / # cat /etc/hosts > # Kubernetes-managed hosts file. > 127.0.0.1 localhost > ::1 localhost ip6-localhost ip6-loopback > fe00::0 ip6-localnet > fe00::0 ip6-mcastprefix > fe00::1 ip6-allnodes > fe00::2 ip6-allrouters > 10.244.2.4 debug > / # nslookup kubernetes > ;; connection timed out; no servers could be reached > > / # nslookup kubernetes.default.svc.cluster.local > ;; connection timed out; no servers could be reached > > / # nslookup google.com > ;; connection timed out; no servers could be reached > > / # curl -i --connect-timeout 15 -H "Host: www.google.com" > https://216.58.192.14/ > curl: (28) Connection timed out after 15001 milliseconds > / # curl -i --connect-timeout 15 -H "Host: kubernetes" > https://10.0.1.101:6443/ > curl: (28) Connection timed out after 15001 milliseconds > / # apk update > fetch http://nl.alpinelinux.org/alpine/edge/main/armhf/APKINDEX.tar.gz > ERROR: http://nl.alpinelinux.org/alpine/edge/main: temporary error (try > again later) > v3.5.0-3172-gb55f907b71 [http://nl.alpinelinux.org/alpine/edge/main] > 1 errors; 5526 distinct packages available > > As you can see from the above session, the kube-dns DNS server is in > /etc/resolv.conf as expected (10.96.0.10), but nslookup fails for the > kubernetes name, both relative and fully qualified, as does nslookup on > google.com. I also tried using the IP of Google and of the Kubernetes node > running the API server manually, but no outgoing connections work. Even > Alpine Linux's package manager, apk, cannot make an outgoing connection. > > Trying the same steps using the "Default" DNS policy for the pod reveals that > DNS resolution and outgoing connections to the Internet still fail: > > $ kubectl run debug --image jimmycuadra/rpi-debug --generator run-pod/v1 > -o yaml --overrides '{"spec":{"dnsPolicy":"Default"}}' --save-config --rm -it > /bin/ash > If you don't see a command prompt, try pressing enter. > / # ifconfig > eth0 Link encap:Ethernet HWaddr 0A:58:0A:F4:01:05 > inet addr:10.244.1.5 Bcast:0.0.0.0 Mask:255.255.255.0 > inet6 addr: fe80::34fc:c5ff:fef6:134/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 > RX packets:18 errors:0 dropped:0 overruns:0 frame:0 > TX packets:7 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:3323 (3.2 KiB) TX bytes:578 (578.0 B) > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:65536 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1 > RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) > / # route -n > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref Use > Iface > 0.0.0.0 10.244.1.1 0.0.0.0 UG 0 0 0 > eth0 > 10.244.0.0 10.244.1.1 255.255.0.0 UG 0 0 0 > eth0 > 10.244.1.0 0.0.0.0 255.255.255.0 U 0 0 0 > eth0 > / # cat /etc/resolv.conf > nameserver 10.0.1.1 > search webpass.net > / # cat /etc/hosts > # Kubernetes-managed hosts file. > 127.0.0.1 localhost > ::1 localhost ip6-localhost ip6-loopback > fe00::0 ip6-localnet > fe00::0 ip6-mcastprefix > fe00::1 ip6-allnodes > fe00::2 ip6-allrouters > 10.244.3.6 debug > / # nslookup google.com > ;; connection timed out; no servers could be reached > > / # curl -i --connect-timeout 15 -H "Host: www.google.com" > https://216.58.192.14/ > curl: (28) Connection timed out after 15000 milliseconds > / # apk update > fetch http://nl.alpinelinux.org/alpine/edge/main/armhf/APKINDEX.tar.gz > ERROR: http://nl.alpinelinux.org/alpine/edge/main: temporary error (try > again later) > v3.5.0-3172-gb55f907b71 [http://nl.alpinelinux.org/alpine/edge/main] > 1 errors; 5526 distinct packages available > > You can see that Flannel is operating, because this debug pod is given an IP > within the pod network's CIDR (as kube-dns was): > > $ kubectl describe pod debug > Name: debug > Namespace: default > Node: kube-03/10.0.1.103 > Start Time: Wed, 05 Apr 2017 02:51:46 -0700 > Labels: <none> > Annotations: > kubectl.kubernetes.io/last-applied-configuration={"kind":"Pod","apiVersion":"v1","metadata":{"name":"debug","creationTimestamp":null},"spec":{"containers":[{"name":"debug","image":"jimmycuadra/rpi-deb... > Status: Running > IP: 10.244.3.6 > Controllers: <none> > Containers: > debug: > Container ID: > docker://8c24be5df5b1f526b901b912c654b63705122b64c194a9556d8453573755c752 > Image: jimmycuadra/rpi-debug > Image ID: > docker-pullable://jimmycuadra/rpi-debug@sha256:144cb3c504e691882034340890d58eac6ac7c11af482a645623c1cb33271ca5e > Port: > Args: > /bin/ash > State: Running > Started: Wed, 05 Apr 2017 02:51:50 -0700 > Ready: True > Restart Count: 0 > Environment: <none> > Mounts: > /var/run/secrets/kubernetes.io/serviceaccount from > default-token-09gfc (ro) > Conditions: > Type Status > Initialized True > Ready True > PodScheduled True > Volumes: > default-token-09gfc: > Type: Secret (a volume populated by a Secret) > SecretName: default-token-09gfc > Optional: false > QoS Class: BestEffort > Node-Selectors: <none> > Tolerations: node.alpha.kubernetes.io/notReady=:Exists:NoExecute > for 300s > node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s > Events: > FirstSeen LastSeen Count From SubObjectPath > Type Reason Message > --------- -------- ----- ---- ------------- > -------- ------ ------- > 2m 2m 1 default-scheduler > Normal Scheduled Successfully assigned debug > to kube-03 > 2m 2m 1 kubelet, kube-03 > spec.containers{debug} Normal Pulled Container image > "jimmycuadra/rpi-debug" already present on machine > 2m 2m 1 kubelet, kube-03 > spec.containers{debug} Normal Created Created container > with id 8c24be5df5b1f526b901b912c654b63705122b64c194a9556d8453573755c752 > 2m 2m 1 kubelet, kube-03 > spec.containers{debug} Normal Started Started container > with id 8c24be5df5b1f526b901b912c654b63705122b64c194a9556d8453573755c752 > > Here is the beginning of the logs for kube-dns: > > $ kubectl logs kube-dns-279829092-wf67d -c kubedns -n kube-system > I0404 05:46:45.782718 1 dns.go:49] version: > v1.5.2-beta.0+$Format:%h$ > I0404 05:46:45.793351 1 server.go:70] Using configuration read from > directory: /kube-dns-config%!(EXTRA time.Duration=10s) > I0404 05:46:45.793794 1 server.go:112] FLAG: > --alsologtostderr="false" > I0404 05:46:45.793942 1 server.go:112] FLAG: > --config-dir="/kube-dns-config" > I0404 05:46:45.794033 1 server.go:112] FLAG: --config-map="" > I0404 05:46:45.794093 1 server.go:112] FLAG: > --config-map-namespace="kube-system" > I0404 05:46:45.794159 1 server.go:112] FLAG: --config-period="10s" > I0404 05:46:45.794247 1 server.go:112] FLAG: > --dns-bind-address="0.0.0.0" > I0404 05:46:45.794311 1 server.go:112] FLAG: --dns-port="10053" > I0404 05:46:45.794427 1 server.go:112] FLAG: > --domain="cluster.local." > I0404 05:46:45.794509 1 server.go:112] FLAG: --federations="" > I0404 05:46:45.794582 1 server.go:112] FLAG: --healthz-port="8081" > I0404 05:46:45.794647 1 server.go:112] FLAG: > --initial-sync-timeout="1m0s" > I0404 05:46:45.794722 1 server.go:112] FLAG: --kube-master-url="" > I0404 05:46:45.794795 1 server.go:112] FLAG: --kubecfg-file="" > I0404 05:46:45.794853 1 server.go:112] FLAG: --log-backtrace-at=":0" > I0404 05:46:45.794933 1 server.go:112] FLAG: --log-dir="" > I0404 05:46:45.795003 1 server.go:112] FLAG: > --log-flush-frequency="5s" > I0404 05:46:45.795073 1 server.go:112] FLAG: --logtostderr="true" > I0404 05:46:45.795144 1 server.go:112] FLAG: --nameservers="" > I0404 05:46:45.795202 1 server.go:112] FLAG: --stderrthreshold="2" > I0404 05:46:45.795264 1 server.go:112] FLAG: --v="2" > I0404 05:46:45.795324 1 server.go:112] FLAG: --version="false" > I0404 05:46:45.795407 1 server.go:112] FLAG: --vmodule="" > I0404 05:46:45.795793 1 server.go:175] Starting SkyDNS server > (0.0.0.0:10053) > I0404 05:46:45.800841 1 server.go:197] Skydns metrics enabled > (/metrics:10055) > I0404 05:46:45.800982 1 dns.go:147] Starting endpointsController > I0404 05:46:45.801050 1 dns.go:150] Starting serviceController > I0404 05:46:45.802186 1 logs.go:41] skydns: ready for queries on > cluster.local. for tcp://0.0.0.0:10053 [rcache 0] > I0404 05:46:45.802431 1 logs.go:41] skydns: ready for queries on > cluster.local. for udp://0.0.0.0:10053 [rcache 0] > I0404 05:46:46.194772 1 dns.go:264] New service: kubernetes > I0404 05:46:46.199497 1 dns.go:462] Added SRV record > &{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > I0404 05:46:46.201053 1 dns.go:264] New service: kube-dns > I0404 05:46:46.201745 1 dns.go:462] Added SRV record > &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > I0404 05:46:46.202287 1 dns.go:462] Added SRV record > &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > I0404 05:46:46.302608 1 dns.go:171] Initialized services and > endpoints from apiserver > I0404 05:46:46.302733 1 server.go:128] Setting up Healthz Handler > (/readiness) > I0404 05:46:46.302843 1 server.go:133] Setting up cache handler > (/cache) > I0404 05:46:46.302935 1 server.go:119] Status HTTP port 8081 > I0404 05:51:45.802627 1 dns.go:264] New service: kubernetes > I0404 05:51:45.803656 1 dns.go:462] Added SRV record > &{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > I0404 05:51:45.804266 1 dns.go:264] New service: kube-dns > I0404 05:51:45.804771 1 dns.go:462] Added SRV record > &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > I0404 05:51:45.805283 1 dns.go:462] Added SRV record > &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > I0404 05:54:12.745272 1 dns.go:264] New service: > kubernetes-dashboard > I0404 05:56:45.805684 1 dns.go:264] New service: kubernetes > I0404 05:56:45.809947 1 dns.go:462] Added SRV record > &{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > I0404 05:56:45.811538 1 dns.go:264] New service: kube-dns > I0404 05:56:45.812488 1 dns.go:462] Added SRV record > &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > I0404 05:56:45.813454 1 dns.go:462] Added SRV record > &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > I0404 05:56:45.814443 1 dns.go:264] New service: > kubernetes-dashboard > I0404 06:01:45.806051 1 dns.go:264] New service: kube-dns > I0404 06:01:45.806895 1 dns.go:462] Added SRV record > &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > I0404 06:01:45.807408 1 dns.go:462] Added SRV record > &{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > I0404 06:01:45.807884 1 dns.go:264] New service: > kubernetes-dashboard > I0404 06:01:45.808341 1 dns.go:264] New service: kubernetes > I0404 06:01:45.808752 1 dns.go:462] Added SRV record > &{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 > Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:} > > I don't see any errors in any of it, just an endless stream of it finding > "kubernetes" and "kube-dns" as "new services" and adding SRV records for them. > > Here are the logs for Flannel on a node where the Flannel pod never restarted: > > $ kubectl logs kube-flannel-ds-g3dwn -c kube-flannel -n kube-system > I0404 05:46:05.193078 1 kube.go:109] Waiting 10m0s for node > controller to sync > I0404 05:46:05.193340 1 kube.go:289] starting kube subnet manager > I0404 05:46:06.194279 1 kube.go:116] Node controller sync successful > I0404 05:46:06.194463 1 main.go:132] Installing signal handlers > I0404 05:46:06.196013 1 manager.go:136] Determining IP address of > default interface > I0404 05:46:06.199502 1 manager.go:149] Using interface with name > eth0 and address 10.0.1.101 > I0404 05:46:06.199681 1 manager.go:166] Defaulting external address > to interface address (10.0.1.101) > I0404 05:46:06.631802 1 ipmasq.go:47] Adding iptables rule: -s > 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN > I0404 05:46:06.665265 1 ipmasq.go:47] Adding iptables rule: -s > 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE > I0404 05:46:06.700650 1 ipmasq.go:47] Adding iptables rule: ! -s > 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE > I0404 05:46:06.720807 1 manager.go:250] Lease acquired: > 10.244.0.0/24 > I0404 05:46:06.722263 1 network.go:58] Watching for L3 misses > I0404 05:46:06.722473 1 network.go:66] Watching for new subnet > leases > I0405 04:46:06.678418 1 network.go:160] Lease renewed, new > expiration: 2017-04-06 04:46:06.652848051 +0000 UTC > > Here are logs from a failed Flannel pod on one of the nodes where it's > restarted a few times: > > $ kubectl logs kube-flannel-ds-x5t2h -c kube-flannel -n kube-system -p > E0404 05:50:02.782218 1 main.go:127] Failed to create > SubnetManager: error retrieving pod spec for > 'kube-system/kube-flannel-ds-x5t2h': Get > https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-x5t2h: > dial tcp 10.96.0.1:443: i/o timeout > > Here are the iptables rules that appear identically on all four servers: > > $ sudo iptables-save > # Generated by iptables-save v1.4.21 on Wed Apr 5 10:01:19 2017 > *nat > :PREROUTING ACCEPT [3:372] > :INPUT ACCEPT [3:372] > :OUTPUT ACCEPT [26:1659] > :POSTROUTING ACCEPT [26:1659] > :DOCKER - [0:0] > :KUBE-MARK-DROP - [0:0] > :KUBE-MARK-MASQ - [0:0] > :KUBE-NODEPORTS - [0:0] > :KUBE-POSTROUTING - [0:0] > :KUBE-SEP-HHOMLR7ARJQ6WUFK - [0:0] > :KUBE-SEP-IT2ZTR26TO4XFPTO - [0:0] > :KUBE-SEP-YIL6JZP7A3QYXJU2 - [0:0] > :KUBE-SERVICES - [0:0] > :KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0] > :KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0] > :KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0] > -A PREROUTING -m comment --comment "kubernetes service portals" -j > KUBE-SERVICES > -A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER > -A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES > -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER > -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j > KUBE-POSTROUTING > -A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE > -A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN > -A POSTROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE > -A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE > -A DOCKER -i docker0 -j RETURN > -A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000 > -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000 > -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic > requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE > -A KUBE-SEP-HHOMLR7ARJQ6WUFK -s 10.0.1.101/32 -m comment --comment > "default/kubernetes:https" -j KUBE-MARK-MASQ > -A KUBE-SEP-HHOMLR7ARJQ6WUFK -p tcp -m comment --comment > "default/kubernetes:https" -m recent --set --name KUBE-SEP-HHOMLR7ARJQ6WUFK > --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination > 10.0.1.101:6443 > -A KUBE-SEP-IT2ZTR26TO4XFPTO -s 10.244.0.2/32 -m comment --comment > "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ > -A KUBE-SEP-IT2ZTR26TO4XFPTO -p tcp -m comment --comment > "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.2:53 > -A KUBE-SEP-YIL6JZP7A3QYXJU2 -s 10.244.0.2/32 -m comment --comment > "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ > -A KUBE-SEP-YIL6JZP7A3QYXJU2 -p udp -m comment --comment > "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.2:53 > -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment > --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j > KUBE-MARK-MASQ > -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment > "default/kubernetes:https cluster IP" -m tcp --dport 443 -j > KUBE-SVC-NPX46M4PTMTKRN6Y > -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment > --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j > KUBE-MARK-MASQ > -A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment > "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j > KUBE-SVC-TCOU7JCQXEZGVUNU > -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment > --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j > KUBE-MARK-MASQ > -A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment > "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j > KUBE-SVC-ERIFXISQEP7F7OF4 > -A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: > this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j > KUBE-NODEPORTS > -A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment > "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-IT2ZTR26TO4XFPTO > -A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" > -m recent --rcheck --seconds 10800 --reap --name KUBE-SEP-HHOMLR7ARJQ6WUFK > --mask 255.255.255.255 --rsource -j KUBE-SEP-HHOMLR7ARJQ6WUFK > -A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" > -j KUBE-SEP-HHOMLR7ARJQ6WUFK > -A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" > -j KUBE-SEP-YIL6JZP7A3QYXJU2 > COMMIT > # Completed on Wed Apr 5 10:01:19 2017 > # Generated by iptables-save v1.4.21 on Wed Apr 5 10:01:19 2017 > *filter > :INPUT ACCEPT [1943:614999] > :FORWARD DROP [0:0] > :OUTPUT ACCEPT [1949:861554] > :DOCKER - [0:0] > :DOCKER-ISOLATION - [0:0] > :KUBE-FIREWALL - [0:0] > :KUBE-SERVICES - [0:0] > -A INPUT -j KUBE-FIREWALL > -A FORWARD -j DOCKER-ISOLATION > -A FORWARD -o docker0 -j DOCKER > -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT > -A FORWARD -i docker0 ! -o docker0 -j ACCEPT > -A FORWARD -i docker0 -o docker0 -j ACCEPT > -A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES > -A OUTPUT -j KUBE-FIREWALL > -A DOCKER-ISOLATION -j RETURN > -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping > marked packets" -m mark --mark 0x8000/0x8000 -j DROP > COMMIT > # Completed on Wed Apr 5 10:01:19 2017 > > Here is the output of ifconfig on the server running the Kubernetes master > components (kube-01): > > $ ifconfig > cni0 Link encap:Ethernet HWaddr 0a:58:0a:f4:00:01 > inet addr:10.244.0.1 Bcast:0.0.0.0 Mask:255.255.255.0 > inet6 addr: fe80::807b:3bff:fedf:ff7d/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 > RX packets:322236 errors:0 dropped:0 overruns:0 frame:0 > TX packets:331776 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:74133093 (70.6 MiB) TX bytes:73272040 (69.8 MiB) > > docker0 Link encap:Ethernet HWaddr 02:42:43:63:54:be > inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0 > UP BROADCAST MULTICAST MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) > > eth0 Link encap:Ethernet HWaddr b8:27:eb:fa:0d:18 > inet addr:10.0.1.101 Bcast:10.0.1.255 Mask:255.255.255.0 > inet6 addr: fe80::ba27:ebff:fefa:d18/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1594829 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1234243 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:482745836 (460.3 MiB) TX bytes:943355891 (899.6 MiB) > > flannel.1 Link encap:Ethernet HWaddr 7a:54:f6:da:6b:a0 > inet addr:10.244.0.0 Bcast:0.0.0.0 Mask:255.255.255.255 > inet6 addr: fe80::7854:f6ff:feda:6ba0/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 > RX packets:3 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:38 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:204 (204.0 B) TX bytes:0 (0.0 B) > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:65536 Metric:1 > RX packets:5523912 errors:0 dropped:0 overruns:0 frame:0 > TX packets:5523912 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1 > RX bytes:2042135076 (1.9 GiB) TX bytes:2042135076 (1.9 GiB) > > vethbe064275 Link encap:Ethernet HWaddr 1e:4f:ea:70:9f:e1 > inet6 addr: fe80::1c4f:eaff:fe70:9fe1/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 > RX packets:322237 errors:0 dropped:0 overruns:0 frame:0 > TX packets:331794 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:78644487 (75.0 MiB) TX bytes:73275343 (69.8 MiB) > > And here it is on the worker node kube-02: > > $ ifconfig > cni0 Link encap:Ethernet HWaddr 0a:58:0a:f4:01:01 > inet addr:10.244.1.1 Bcast:0.0.0.0 Mask:255.255.255.0 > inet6 addr: fe80::383a:41ff:fea4:f113/64 Scope:Link > UP BROADCAST MULTICAST MTU:1500 Metric:1 > RX packets:125 errors:0 dropped:0 overruns:0 frame:0 > TX packets:51 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:7794 (7.6 KiB) TX bytes:7391 (7.2 KiB) > > docker0 Link encap:Ethernet HWaddr 02:42:ad:1b:1e:a3 > inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0 > UP BROADCAST MULTICAST MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) > > eth0 Link encap:Ethernet HWaddr b8:27:eb:bb:ff:69 > inet addr:10.0.1.102 Bcast:10.0.1.255 Mask:255.255.255.0 > inet6 addr: fe80::ba27:ebff:febb:ff69/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:750764 errors:0 dropped:0 overruns:0 frame:0 > TX packets:442199 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:597869801 (570.1 MiB) TX bytes:42574858 (40.6 MiB) > > flannel.1 Link encap:Ethernet HWaddr 7a:ce:5a:3b:78:80 > inet addr:10.244.1.0 Bcast:0.0.0.0 Mask:255.255.255.255 > inet6 addr: fe80::78ce:5aff:fe3b:7880/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:38 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:65536 Metric:1 > RX packets:4 errors:0 dropped:0 overruns:0 frame:0 > TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1 > RX bytes:240 (240.0 B) TX bytes:240 (240.0 B) > > Again, if anyone has made it this far, please let me know if you have any > ideas, or if there are other commands I can show the output of to help narrow > it down! > > Thanks very much, > Jimmy > > -- > You received this message because you are subscribed to the Google Groups > "Kubernetes user discussion and Q&A" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to kubernetes-users+unsubscr...@googlegroups.com. > To post to this group, send email to kubernetes-users@googlegroups.com. > Visit this group at https://groups.google.com/group/kubernetes-users. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscr...@googlegroups.com. To post to this group, send email to kubernetes-users@googlegroups.com. Visit this group at https://groups.google.com/group/kubernetes-users. For more options, visit https://groups.google.com/d/optout.