[kubernetes-users] Outgoing network connections from pods fail on brand new cluster

jimmycuadra Wed, 05 Apr 2017 03:16:55 -0700

Hello all,

I'm having an unusual problem with running Kubernetes on a cluster of four 
Raspberry Pi 3s: all outgoing networking connections from inside pods are 
failing. My hunch is that the cause of the problem is something related to the 
overlay network (I'm using Flannel) but I am really not sure. All of the 
relevant details I can think of follow. If anyone has an idea what the problem 
might be or how I can debug it further, I'd be grateful!


The cluster is running on four brand new Raspberry Pi 3 Model B machines 
connected to my home network using Ethernet. Network requests work as expected 
from the host machines.

The servers are all flashed with Hypriot OS v1.4.0 
(https://github.com/hypriot/image-builder-rpi/releases/tag/v1.4.0) with Docker 
manually downgraded to v1.12.6, which is known to work with Kubernetes 1.6. 
Kubernetes is the only thing installed on these servers.

Kubernetes 1.6.1 is installed with kubeadm 1.6.1 following the getting started 
guide exactly (https://kubernetes.io/docs/getting-started-guides/kubeadm/). 
Specifically, the kubeadm command I start with is: `kubeadm init 
--apiserver-cert-extra-sans example.com --pod-network-cidr 10.244.0.0/16` 
(where example.com is public DNS record for my home network.)

RBAC roles are created for Flannel with `kubectl apply -f flannel-rbac.yml` 
where the contents of the file are:

    ---
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1beta1
    metadata:
      name: flannel
    rules:
      - apiGroups:
          - ""
        resources:
          - pods
        verbs:
          - get
      - apiGroups:
          - ""
        resources:
          - nodes
        verbs:
          - list
          - update
          - watch
    ---
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1beta1
    metadata:
      name: flannel
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: flannel
    subjects:
    - kind: ServiceAccount
      name: flannel
      namespace: kube-system

Flannel is deployed with `kubectl apply -f flannel.yml` where the contents of 
the file are:

    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: flannel
      namespace: kube-system
    ---
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: kube-flannel-cfg
      namespace: kube-system
      labels:
        tier: node
        app: flannel
    data:
      cni-conf.json: |
        {
          "name": "cbr0",
          "type": "flannel",
          "delegate": {
            "isDefaultGateway": true
          }
        }
      net-conf.json: |
        {
          "Network": "10.244.0.0/16",
          "Backend": {
            "Type": "vxlan"
          }
        }
    ---
    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
      name: kube-flannel-ds
      namespace: kube-system
      labels:
        tier: node
        app: flannel
    spec:
      template:
        metadata:
          labels:
            tier: node
            app: flannel
        spec:
          hostNetwork: true
          nodeSelector:
            beta.kubernetes.io/arch: arm
          tolerations:
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
          serviceAccountName: flannel
          containers:
          - name: kube-flannel
            image: quay.io/coreos/flannel:v0.7.0-arm
            command: [ "/opt/bin/flanneld", "--ip-masq", "--kube-subnet-mgr" ]
            securityContext:
              privileged: true
            env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            volumeMounts:
            - name: run
              mountPath: /run
            - name: flannel-cfg
              mountPath: /etc/kube-flannel/
          - name: install-cni
            image: quay.io/coreos/flannel:v0.7.0-arm
            command: [ "/bin/sh", "-c", "set -e -x; cp -f 
/etc/kube-flannel/cni-conf.json /etc/cni/net.d/10-flannel.conf; while true; do 
sleep 3600; done" ]
            volumeMounts:
            - name: cni
              mountPath: /etc/cni/net.d
            - name: flannel-cfg
              mountPath: /etc/kube-flannel/
          volumes:
            - name: run
              hostPath:
                path: /run
            - name: cni
              hostPath:
                path: /etc/cni/net.d
            - name: flannel-cfg
              configMap:
                name: kube-flannel-cfg

All Kubernetes nodes are online (kube-01 is the master):

    $ kubectl get nodes
    NAME          STATUS    AGE       VERSION
    kube-01   Ready     1d        v1.6.1
    kube-02   Ready     1d        v1.6.1
    kube-03   Ready     1d        v1.6.1
    kube-04   Ready     1d        v1.6.1

Here are the details of the kube-02 node, just as an example to show the node 
details:

    $ kubectl describe node kube-02
    Name:                       kube-02
    Role:
    Labels:                     beta.kubernetes.io/arch=arm
          beta.kubernetes.io/os=linux
          ingress-controller=traefik
          kubernetes.io/hostname=kube-02
    Annotations:                
flannel.alpha.coreos.com/backend-data={"VtepMAC":"7a:ce:5a:3b:78:80"}
          flannel.alpha.coreos.com/backend-type=vxlan
          flannel.alpha.coreos.com/kube-subnet-manager=true
          flannel.alpha.coreos.com/public-ip=10.0.1.102
          node.alpha.kubernetes.io/ttl=0
          volumes.kubernetes.io/controller-managed-attach-detach=true
    Taints:                     <none>
    CreationTimestamp:  Mon, 03 Apr 2017 22:46:36 -0700
    Phase:
    Conditions:
      Type                      Status  LastHeartbeatTime                       
LastTransitionTime                      Reason                          Message
      ----                      ------  -----------------                       
------------------                      ------                          -------
      OutOfDisk                 False   Wed, 05 Apr 2017 02:35:43 -0700         
Mon, 03 Apr 2017 22:46:36 -0700         KubeletHasSufficientDisk        kubelet 
has sufficient disk space available
      MemoryPressure    False   Wed, 05 Apr 2017 02:35:43 -0700         Mon, 03 
Apr 2017 22:46:36 -0700         KubeletHasSufficientMemory      kubelet has 
sufficient memory available
      DiskPressure              False   Wed, 05 Apr 2017 02:35:43 -0700         
Mon, 03 Apr 2017 22:46:36 -0700         KubeletHasNoDiskPressure        kubelet 
has no disk pressure
      Ready             True    Wed, 05 Apr 2017 02:35:43 -0700         Mon, 03 
Apr 2017 22:47:38 -0700         KubeletReady                    kubelet is 
posting ready status
    Addresses:          10.0.1.102,10.0.1.102,kube-02
    Capacity:
     cpu:               4
     memory:    882632Ki
     pods:              110
    Allocatable:
     cpu:               4
     memory:    780232Ki
     pods:              110
    System Info:
     Machine ID:                        9989a26f06984d6dbadc01770f018e3b
     System UUID:                       9989a26f06984d6dbadc01770f018e3b
     Boot ID:                   4a400ae5-aaee-4c25-9125-4e0df445e064
     Kernel Version:            4.4.50-hypriotos-v7+
     OS Image:                  Raspbian GNU/Linux 8 (jessie)
     Operating System:          linux
     Architecture:                      arm
     Container Runtime Version: docker://1.12.6
     Kubelet Version:           v1.6.1
     Kube-Proxy Version:                v1.6.1
    PodCIDR:                    10.244.1.0/24
    ExternalID:                 kube-02
    Non-terminated Pods:                (2 in total)
      Namespace                 Name                            CPU Requests    
CPU Limits      Memory Requests Memory Limits
      ---------                 ----                            ------------    
----------      --------------- -------------
      kube-system                       kube-flannel-ds-p5l6q           0 (0%)  
        0 (0%)          0 (0%)          0 (0%)
      kube-system                       kube-proxy-z9dpz                0 (0%)  
        0 (0%)          0 (0%)          0 (0%)
    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
      CPU Requests      CPU Limits      Memory Requests Memory Limits
      ------------      ----------      --------------- -------------
      0 (0%)    0 (0%)          0 (0%)          0 (0%)
    Events:             <none>

All pods, including kube-dns, are running as expected:

    $ kubectl get pods --all-namespaces
    NAMESPACE     NAME                                  READY     STATUS    
RESTARTS   AGE
    kube-system   etcd-kube-01                          1/1       Running   0   
       1d
    kube-system   kube-apiserver-kube-01                1/1       Running   0   
       1d
    kube-system   kube-controller-manager-kube-01       1/1       Running   0   
       1d
    kube-system   kube-dns-279829092-wf67d              3/3       Running   0   
       1d
    kube-system   kube-flannel-ds-g3dwn                 2/2       Running   0   
       1d
    kube-system   kube-flannel-ds-p5l6q                 2/2       Running   2   
       1d
    kube-system   kube-flannel-ds-sk2ln                 2/2       Running   0   
       1d
    kube-system   kube-flannel-ds-x5t2h                 2/2       Running   3   
       1d
    kube-system   kube-proxy-3c8s6                      1/1       Running   0   
       1d
    kube-system   kube-proxy-kh0fh                      1/1       Running   0   
       1d
    kube-system   kube-proxy-pgcz6                      1/1       Running   0   
       1d
    kube-system   kube-proxy-z9dpz                      1/1       Running   0   
       1d
    kube-system   kube-scheduler-kube-01                1/1       Running   0   
       1d

Services for the API server and DNS exist, as expected:

    $ kubectl get svc --all-namespaces
    NAMESPACE     NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
    default       kubernetes   10.96.0.1    <none>        443/TCP         1d
    kube-system   kube-dns     10.96.0.10   <none>        53/UDP,53/TCP   1d

And endpoints for those services exist, as expected:

    $ kubectl get endpoints --all-namespaces
    NAMESPACE     NAME                      ENDPOINTS                     AGE
    default       kubernetes                10.0.1.101:6443               1d
    kube-system   kube-controller-manager   <none>                        1d
    kube-system   kube-dns                  10.244.0.2:53,10.244.0.2:53   1d
    kube-system   kube-scheduler            <none>                        1d

Note that the API server is running on the host network, as this is how kubeadm 
sets up its static pod, while kube-dns is running on the overlay network.

Initially, I tried deploying a few other applications, including the Kubernetes 
Dashboard, and Traefik (used as an ingress controller) but produced errors in 
their logs about not being able to contact the API server, which was my first 
clues that something was wrong. Eventually, I reduced the problem to the 
following failing test case. The Docker image is 
https://hub.docker.com/r/jimmycuadra/rpi-debug/, which is just an ARM build of 
Alpine Linux with `dig` and `curl` installed in addition to the stock 
`nslookup`.

    $ kubectl run debug --image jimmycuadra/rpi-debug --generator run-pod/v1 -o 
yaml --save-config --rm -it /bin/ash
    If you don't see a command prompt, try pressing enter.
    / # ifconfig
    eth0      Link encap:Ethernet  HWaddr 0A:58:0A:F4:02:05
              inet addr:10.244.2.5  Bcast:0.0.0.0  Mask:255.255.255.0
              inet6 addr: fe80::c49f:43ff:fece:b3c3/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
              RX packets:18 errors:0 dropped:0 overruns:0 frame:0
              TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:3323 (3.2 KiB)  TX bytes:578 (578.0 B)

    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
    / # route -n
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use 
Iface
    0.0.0.0         10.244.3.1      0.0.0.0         UG    0      0        0 eth0
    10.244.0.0      10.244.3.1      255.255.0.0     UG    0      0        0 eth0
    10.244.3.0      0.0.0.0         255.255.255.0   U     0      0        0 eth0
    / # cat /etc/resolv.conf
    nameserver 10.96.0.10
    search default.svc.cluster.local svc.cluster.local cluster.local webpass.net
    options ndots:5

    / # cat /etc/hosts
    # Kubernetes-managed hosts file.
    127.0.0.1   localhost
    ::1 localhost ip6-localhost ip6-loopback
    fe00::0     ip6-localnet
    fe00::0     ip6-mcastprefix
    fe00::1     ip6-allnodes
    fe00::2     ip6-allrouters
    10.244.2.4  debug
    / # nslookup kubernetes
    ;; connection timed out; no servers could be reached

    / # nslookup kubernetes.default.svc.cluster.local
    ;; connection timed out; no servers could be reached

    / # nslookup google.com
    ;; connection timed out; no servers could be reached

    / # curl -i --connect-timeout 15 -H "Host: www.google.com" 
https://216.58.192.14/
    curl: (28) Connection timed out after 15001 milliseconds
    / # curl -i --connect-timeout 15 -H "Host: kubernetes" 
https://10.0.1.101:6443/
    curl: (28) Connection timed out after 15001 milliseconds
    / # apk update
    fetch http://nl.alpinelinux.org/alpine/edge/main/armhf/APKINDEX.tar.gz
    ERROR: http://nl.alpinelinux.org/alpine/edge/main: temporary error (try 
again later)
    v3.5.0-3172-gb55f907b71 [http://nl.alpinelinux.org/alpine/edge/main]
    1 errors; 5526 distinct packages available

As you can see from the above session, the kube-dns DNS server is in 
/etc/resolv.conf as expected (10.96.0.10), but nslookup fails for the 
kubernetes name, both relative and fully qualified, as does nslookup on 
google.com. I also tried using the IP of Google and of the Kubernetes node 
running the API server manually, but no outgoing connections work. Even Alpine 
Linux's package manager, apk, cannot make an outgoing connection.

Trying the same steps using the "Default" DNS policy for the pod reveals that 
DNS resolution and outgoing connections to the Internet still fail:

    $ kubectl run debug --image jimmycuadra/rpi-debug --generator run-pod/v1 -o 
yaml --overrides '{"spec":{"dnsPolicy":"Default"}}' --save-config --rm -it 
/bin/ash
    If you don't see a command prompt, try pressing enter.
    / # ifconfig
    eth0      Link encap:Ethernet  HWaddr 0A:58:0A:F4:01:05
              inet addr:10.244.1.5  Bcast:0.0.0.0  Mask:255.255.255.0
              inet6 addr: fe80::34fc:c5ff:fef6:134/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
              RX packets:18 errors:0 dropped:0 overruns:0 frame:0
              TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:3323 (3.2 KiB)  TX bytes:578 (578.0 B)

    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
    / # route -n
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use 
Iface
    0.0.0.0         10.244.1.1      0.0.0.0         UG    0      0        0 eth0
    10.244.0.0      10.244.1.1      255.255.0.0     UG    0      0        0 eth0
    10.244.1.0      0.0.0.0         255.255.255.0   U     0      0        0 eth0
    / # cat /etc/resolv.conf
    nameserver 10.0.1.1
    search webpass.net
    / # cat /etc/hosts
    # Kubernetes-managed hosts file.
    127.0.0.1   localhost
    ::1 localhost ip6-localhost ip6-loopback
    fe00::0     ip6-localnet
    fe00::0     ip6-mcastprefix
    fe00::1     ip6-allnodes
    fe00::2     ip6-allrouters
    10.244.3.6  debug
    / # nslookup google.com
    ;; connection timed out; no servers could be reached

    / # curl -i --connect-timeout 15 -H "Host: www.google.com" 
https://216.58.192.14/
    curl: (28) Connection timed out after 15000 milliseconds
    / # apk update
    fetch http://nl.alpinelinux.org/alpine/edge/main/armhf/APKINDEX.tar.gz
    ERROR: http://nl.alpinelinux.org/alpine/edge/main: temporary error (try 
again later)
    v3.5.0-3172-gb55f907b71 [http://nl.alpinelinux.org/alpine/edge/main]
    1 errors; 5526 distinct packages available

You can see that Flannel is operating, because this debug pod is given an IP 
within the pod network's CIDR (as kube-dns was):

    $ kubectl describe pod debug
    Name:               debug
    Namespace:  default
    Node:               kube-03/10.0.1.103
    Start Time: Wed, 05 Apr 2017 02:51:46 -0700
    Labels:             <none>
    Annotations:        
kubectl.kubernetes.io/last-applied-configuration={"kind":"Pod","apiVersion":"v1","metadata":{"name":"debug","creationTimestamp":null},"spec":{"containers":[{"name":"debug","image":"jimmycuadra/rpi-deb...
    Status:             Running
    IP:         10.244.3.6
    Controllers:        <none>
    Containers:
      debug:
        Container ID:   
docker://8c24be5df5b1f526b901b912c654b63705122b64c194a9556d8453573755c752
        Image:          jimmycuadra/rpi-debug
        Image ID:               
docker-pullable://jimmycuadra/rpi-debug@sha256:144cb3c504e691882034340890d58eac6ac7c11af482a645623c1cb33271ca5e
        Port:
        Args:
          /bin/ash
        State:          Running
          Started:              Wed, 05 Apr 2017 02:51:50 -0700
        Ready:          True
        Restart Count:  0
        Environment:    <none>
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from 
default-token-09gfc (ro)
    Conditions:
      Type              Status
      Initialized       True
      Ready     True
      PodScheduled      True
    Volumes:
      default-token-09gfc:
        Type:   Secret (a volume populated by a Secret)
        SecretName:     default-token-09gfc
        Optional:       false
    QoS Class:  BestEffort
    Node-Selectors:     <none>
    Tolerations:        node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 
300s
        node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
    Events:
      FirstSeen LastSeen        Count   From                    SubObjectPath   
        Type            Reason          Message
      --------- --------        -----   ----                    -------------   
        --------        ------          -------
      2m                2m              1       default-scheduler               
                Normal          Scheduled       Successfully assigned debug to 
kube-03
      2m                2m              1       kubelet, kube-03        
spec.containers{debug}  Normal          Pulled          Container image 
"jimmycuadra/rpi-debug" already present on machine
      2m                2m              1       kubelet, kube-03        
spec.containers{debug}  Normal          Created         Created container with 
id 8c24be5df5b1f526b901b912c654b63705122b64c194a9556d8453573755c752
      2m                2m              1       kubelet, kube-03        
spec.containers{debug}  Normal          Started         Started container with 
id 8c24be5df5b1f526b901b912c654b63705122b64c194a9556d8453573755c752

Here is the beginning of the logs for kube-dns:

    $ kubectl logs kube-dns-279829092-wf67d -c kubedns -n kube-system
    I0404 05:46:45.782718       1 dns.go:49] version: v1.5.2-beta.0+$Format:%h$
    I0404 05:46:45.793351       1 server.go:70] Using configuration read from 
directory: /kube-dns-config%!(EXTRA time.Duration=10s)
    I0404 05:46:45.793794       1 server.go:112] FLAG: --alsologtostderr="false"
    I0404 05:46:45.793942       1 server.go:112] FLAG: 
--config-dir="/kube-dns-config"
    I0404 05:46:45.794033       1 server.go:112] FLAG: --config-map=""
    I0404 05:46:45.794093       1 server.go:112] FLAG: 
--config-map-namespace="kube-system"
    I0404 05:46:45.794159       1 server.go:112] FLAG: --config-period="10s"
    I0404 05:46:45.794247       1 server.go:112] FLAG: 
--dns-bind-address="0.0.0.0"
    I0404 05:46:45.794311       1 server.go:112] FLAG: --dns-port="10053"
    I0404 05:46:45.794427       1 server.go:112] FLAG: --domain="cluster.local."
    I0404 05:46:45.794509       1 server.go:112] FLAG: --federations=""
    I0404 05:46:45.794582       1 server.go:112] FLAG: --healthz-port="8081"
    I0404 05:46:45.794647       1 server.go:112] FLAG: 
--initial-sync-timeout="1m0s"
    I0404 05:46:45.794722       1 server.go:112] FLAG: --kube-master-url=""
    I0404 05:46:45.794795       1 server.go:112] FLAG: --kubecfg-file=""
    I0404 05:46:45.794853       1 server.go:112] FLAG: --log-backtrace-at=":0"
    I0404 05:46:45.794933       1 server.go:112] FLAG: --log-dir=""
    I0404 05:46:45.795003       1 server.go:112] FLAG: 
--log-flush-frequency="5s"
    I0404 05:46:45.795073       1 server.go:112] FLAG: --logtostderr="true"
    I0404 05:46:45.795144       1 server.go:112] FLAG: --nameservers=""
    I0404 05:46:45.795202       1 server.go:112] FLAG: --stderrthreshold="2"
    I0404 05:46:45.795264       1 server.go:112] FLAG: --v="2"
    I0404 05:46:45.795324       1 server.go:112] FLAG: --version="false"
    I0404 05:46:45.795407       1 server.go:112] FLAG: --vmodule=""
    I0404 05:46:45.795793       1 server.go:175] Starting SkyDNS server 
(0.0.0.0:10053)
    I0404 05:46:45.800841       1 server.go:197] Skydns metrics enabled 
(/metrics:10055)
    I0404 05:46:45.800982       1 dns.go:147] Starting endpointsController
    I0404 05:46:45.801050       1 dns.go:150] Starting serviceController
    I0404 05:46:45.802186       1 logs.go:41] skydns: ready for queries on 
cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
    I0404 05:46:45.802431       1 logs.go:41] skydns: ready for queries on 
cluster.local. for udp://0.0.0.0:10053 [rcache 0]
    I0404 05:46:46.194772       1 dns.go:264] New service: kubernetes
    I0404 05:46:46.199497       1 dns.go:462] Added SRV record 
&{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
    I0404 05:46:46.201053       1 dns.go:264] New service: kube-dns
    I0404 05:46:46.201745       1 dns.go:462] Added SRV record 
&{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
    I0404 05:46:46.202287       1 dns.go:462] Added SRV record 
&{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
    I0404 05:46:46.302608       1 dns.go:171] Initialized services and 
endpoints from apiserver
    I0404 05:46:46.302733       1 server.go:128] Setting up Healthz Handler 
(/readiness)
    I0404 05:46:46.302843       1 server.go:133] Setting up cache handler 
(/cache)
    I0404 05:46:46.302935       1 server.go:119] Status HTTP port 8081
    I0404 05:51:45.802627       1 dns.go:264] New service: kubernetes
    I0404 05:51:45.803656       1 dns.go:462] Added SRV record 
&{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
    I0404 05:51:45.804266       1 dns.go:264] New service: kube-dns
    I0404 05:51:45.804771       1 dns.go:462] Added SRV record 
&{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
    I0404 05:51:45.805283       1 dns.go:462] Added SRV record 
&{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
    I0404 05:54:12.745272       1 dns.go:264] New service: kubernetes-dashboard
    I0404 05:56:45.805684       1 dns.go:264] New service: kubernetes
    I0404 05:56:45.809947       1 dns.go:462] Added SRV record 
&{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
    I0404 05:56:45.811538       1 dns.go:264] New service: kube-dns
    I0404 05:56:45.812488       1 dns.go:462] Added SRV record 
&{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
    I0404 05:56:45.813454       1 dns.go:462] Added SRV record 
&{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
    I0404 05:56:45.814443       1 dns.go:264] New service: kubernetes-dashboard
    I0404 06:01:45.806051       1 dns.go:264] New service: kube-dns
    I0404 06:01:45.806895       1 dns.go:462] Added SRV record 
&{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
    I0404 06:01:45.807408       1 dns.go:462] Added SRV record 
&{Host:kube-dns.kube-system.svc.cluster.local. Port:53 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}
    I0404 06:01:45.807884       1 dns.go:264] New service: kubernetes-dashboard
    I0404 06:01:45.808341       1 dns.go:264] New service: kubernetes
    I0404 06:01:45.808752       1 dns.go:462] Added SRV record 
&{Host:kubernetes.default.svc.cluster.local. Port:443 Priority:10 Weight:10 
Text: Mail:false Ttl:30 TargetStrip:0 Group: Key:}

I don't see any errors in any of it, just an endless stream of it finding 
"kubernetes" and "kube-dns" as "new services" and adding SRV records for them.

Here are the logs for Flannel on a node where the Flannel pod never restarted:

    $ kubectl logs kube-flannel-ds-g3dwn -c kube-flannel -n kube-system
    I0404 05:46:05.193078       1 kube.go:109] Waiting 10m0s for node 
controller to sync
    I0404 05:46:05.193340       1 kube.go:289] starting kube subnet manager
    I0404 05:46:06.194279       1 kube.go:116] Node controller sync successful
    I0404 05:46:06.194463       1 main.go:132] Installing signal handlers
    I0404 05:46:06.196013       1 manager.go:136] Determining IP address of 
default interface
    I0404 05:46:06.199502       1 manager.go:149] Using interface with name 
eth0 and address 10.0.1.101
    I0404 05:46:06.199681       1 manager.go:166] Defaulting external address 
to interface address (10.0.1.101)
    I0404 05:46:06.631802       1 ipmasq.go:47] Adding iptables rule: -s 
10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
    I0404 05:46:06.665265       1 ipmasq.go:47] Adding iptables rule: -s 
10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
    I0404 05:46:06.700650       1 ipmasq.go:47] Adding iptables rule: ! -s 
10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
    I0404 05:46:06.720807       1 manager.go:250] Lease acquired: 10.244.0.0/24
    I0404 05:46:06.722263       1 network.go:58] Watching for L3 misses
    I0404 05:46:06.722473       1 network.go:66] Watching for new subnet leases
    I0405 04:46:06.678418       1 network.go:160] Lease renewed, new 
expiration: 2017-04-06 04:46:06.652848051 +0000 UTC

Here are logs from a failed Flannel pod on one of the nodes where it's 
restarted a few times:

    $ kubectl logs kube-flannel-ds-x5t2h -c kube-flannel -n kube-system -p
    E0404 05:50:02.782218       1 main.go:127] Failed to create SubnetManager: 
error retrieving pod spec for 'kube-system/kube-flannel-ds-x5t2h': Get 
https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-x5t2h: 
dial tcp 10.96.0.1:443: i/o timeout

Here are the iptables rules that appear identically on all four servers:

$ sudo iptables-save
# Generated by iptables-save v1.4.21 on Wed Apr  5 10:01:19 2017
*nat
:PREROUTING ACCEPT [3:372]
:INPUT ACCEPT [3:372]
:OUTPUT ACCEPT [26:1659]
:POSTROUTING ACCEPT [26:1659]
:DOCKER - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SEP-HHOMLR7ARJQ6WUFK - [0:0]
:KUBE-SEP-IT2ZTR26TO4XFPTO - [0:0]
:KUBE-SEP-YIL6JZP7A3QYXJU2 - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j 
KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
-A POSTROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
-A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring 
SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-HHOMLR7ARJQ6WUFK -s 10.0.1.101/32 -m comment --comment 
"default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-HHOMLR7ARJQ6WUFK -p tcp -m comment --comment 
"default/kubernetes:https" -m recent --set --name KUBE-SEP-HHOMLR7ARJQ6WUFK 
--mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 10.0.1.101:6443
-A KUBE-SEP-IT2ZTR26TO4XFPTO -s 10.244.0.2/32 -m comment --comment 
"kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-IT2ZTR26TO4XFPTO -p tcp -m comment --comment 
"kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.2:53
-A KUBE-SEP-YIL6JZP7A3QYXJU2 -s 10.244.0.2/32 -m comment --comment 
"kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-YIL6JZP7A3QYXJU2 -p udp -m comment --comment 
"kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.2:53
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment 
"default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment 
"default/kubernetes:https cluster IP" -m tcp --dport 443 -j 
KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment 
--comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j 
KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment 
"kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j 
KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment 
--comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j 
KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment 
"kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j 
KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this 
must be the last rule in this chain" -m addrtype --dst-type LOCAL -j 
KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment 
"kube-system/kube-dns:dns-tcp" -j KUBE-SEP-IT2ZTR26TO4XFPTO
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m 
recent --rcheck --seconds 10800 --reap --name KUBE-SEP-HHOMLR7ARJQ6WUFK --mask 
255.255.255.255 --rsource -j KUBE-SEP-HHOMLR7ARJQ6WUFK
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j 
KUBE-SEP-HHOMLR7ARJQ6WUFK
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j 
KUBE-SEP-YIL6JZP7A3QYXJU2
COMMIT
# Completed on Wed Apr  5 10:01:19 2017
# Generated by iptables-save v1.4.21 on Wed Apr  5 10:01:19 2017
*filter
:INPUT ACCEPT [1943:614999]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [1949:861554]
:DOCKER - [0:0]
:DOCKER-ISOLATION - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -j KUBE-FIREWALL
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A DOCKER-ISOLATION -j RETURN
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked 
packets" -m mark --mark 0x8000/0x8000 -j DROP
COMMIT
# Completed on Wed Apr  5 10:01:19 2017

Here is the output of ifconfig on the server running the Kubernetes master 
components (kube-01):

    $ ifconfig
    cni0      Link encap:Ethernet  HWaddr 0a:58:0a:f4:00:01
              inet addr:10.244.0.1  Bcast:0.0.0.0  Mask:255.255.255.0
              inet6 addr: fe80::807b:3bff:fedf:ff7d/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
              RX packets:322236 errors:0 dropped:0 overruns:0 frame:0
              TX packets:331776 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:74133093 (70.6 MiB)  TX bytes:73272040 (69.8 MiB)

    docker0   Link encap:Ethernet  HWaddr 02:42:43:63:54:be
              inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

    eth0      Link encap:Ethernet  HWaddr b8:27:eb:fa:0d:18
              inet addr:10.0.1.101  Bcast:10.0.1.255  Mask:255.255.255.0
              inet6 addr: fe80::ba27:ebff:fefa:d18/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:1594829 errors:0 dropped:0 overruns:0 frame:0
              TX packets:1234243 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:482745836 (460.3 MiB)  TX bytes:943355891 (899.6 MiB)

    flannel.1 Link encap:Ethernet  HWaddr 7a:54:f6:da:6b:a0
              inet addr:10.244.0.0  Bcast:0.0.0.0  Mask:255.255.255.255
              inet6 addr: fe80::7854:f6ff:feda:6ba0/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
              RX packets:3 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:38 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:204 (204.0 B)  TX bytes:0 (0.0 B)

    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:5523912 errors:0 dropped:0 overruns:0 frame:0
              TX packets:5523912 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:2042135076 (1.9 GiB)  TX bytes:2042135076 (1.9 GiB)

    vethbe064275 Link encap:Ethernet  HWaddr 1e:4f:ea:70:9f:e1
              inet6 addr: fe80::1c4f:eaff:fe70:9fe1/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
              RX packets:322237 errors:0 dropped:0 overruns:0 frame:0
              TX packets:331794 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:78644487 (75.0 MiB)  TX bytes:73275343 (69.8 MiB)

And here it is on the worker node kube-02:

    $ ifconfig
    cni0      Link encap:Ethernet  HWaddr 0a:58:0a:f4:01:01
              inet addr:10.244.1.1  Bcast:0.0.0.0  Mask:255.255.255.0
              inet6 addr: fe80::383a:41ff:fea4:f113/64 Scope:Link
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:125 errors:0 dropped:0 overruns:0 frame:0
              TX packets:51 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:7794 (7.6 KiB)  TX bytes:7391 (7.2 KiB)

    docker0   Link encap:Ethernet  HWaddr 02:42:ad:1b:1e:a3
              inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

    eth0      Link encap:Ethernet  HWaddr b8:27:eb:bb:ff:69
              inet addr:10.0.1.102  Bcast:10.0.1.255  Mask:255.255.255.0
              inet6 addr: fe80::ba27:ebff:febb:ff69/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:750764 errors:0 dropped:0 overruns:0 frame:0
              TX packets:442199 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:597869801 (570.1 MiB)  TX bytes:42574858 (40.6 MiB)

    flannel.1 Link encap:Ethernet  HWaddr 7a:ce:5a:3b:78:80
              inet addr:10.244.1.0  Bcast:0.0.0.0  Mask:255.255.255.255
              inet6 addr: fe80::78ce:5aff:fe3b:7880/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:38 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:4 errors:0 dropped:0 overruns:0 frame:0
              TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:240 (240.0 B)  TX bytes:240 (240.0 B)

Again, if anyone has made it this far, please let me know if you have any 
ideas, or if there are other commands I can show the output of to help narrow 
it down!

Thanks very much,
Jimmy

-- 
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

[kubernetes-users] Outgoing network connections from pods fail on brand new cluster

Reply via email to