Hey Simon,

Thanks for looking into this. It just happened again and I have more data.

To clarify a few things, docker is configured to send *all* dns requests to
172.17.0.1:53 (docker itself creates the 172.17.0.1 interface)*.* I
implemented dnsmasq with bind-dynamic because docker would typically start
*after* dnsmasq; dnsmasq would never get a chance to bind to 172.17.0.1.
The ipv6 sockets that dnsmasq binds to (shown below) are never actually
utilized by anything in my system, just a side effect of the current
configuration.

Sample ifconfig
```
docker0   Link encap:Ethernet  HWaddr 02:42:ea:b9:37:ce
          inet addr:172.17.0.1  Bcast:172.17.0.255  Mask:255.255.255.0
          inet6 addr: fe80::42:eaff:feb9:37ce/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1137922568 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1183858585 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:694264821785 (694.2 GB)  TX bytes:731462199754 (731.4 GB)

ens5      Link encap:Ethernet  HWaddr 02:f3:39:28:91:08
          inet addr:10.3.4.228  Bcast:10.3.4.255  Mask:255.255.255.0
          inet6 addr: fe80::f3:39ff:fe28:9108/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:1414749050 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1409286954 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:746355077483 (746.3 GB)  TX bytes:716453625946 (716.4 GB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:39175242 errors:0 dropped:0 overruns:0 frame:0
          TX packets:39175242 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:15795890457 (15.7 GB)  TX bytes:15795890457 (15.7 GB)

```

Failed Node (netstat)
```
udp6       0      0 fe80::888a:23ff:fe7a:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::d817:59ff:feb4:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::7c83:e5ff:fe73:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::387c:fff:fe8b::53 :::*
    1359/dnsmasq
udp6       0      0 fe80::844c:15ff:fe94:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::8402:6bff:fefa:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::740e:c1ff:feef:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::f841:8fff:fefb:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::d0b3:42ff:fe20:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::984b:45ff:fe2b:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::e4cc:c4ff:fe87:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::704d:afff:fec4:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::707c:7eff:fea2:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::c8c3:70ff:fe8c:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::506b:10ff:fe1a:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::88c:2eff:fe38::53 :::*
    1359/dnsmasq
udp6       0      0 fe80::4478:d5ff:fea0:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::1c34:96ff:fe2c:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::1079:21ff:fe36:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::4dc:53ff:fe8a::53 :::*
    1359/dnsmasq
udp6       0      0 fe80::c851:e0ff:fe4c:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::4c0:e3ff:fe15::53 :::*
    1359/dnsmasq
udp6       0      0 fe80::fcca:afff:fe23:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::a067:8cff:fefb:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::1081:54ff:feca:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::fc67:89ff:fe1a:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::a892:65ff:fe5a:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::689e:e5ff:fe4d:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::9cc5:59ff:fe94:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::f031:faff:feb8:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::ccf2:e4ff:fe3f:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::5c66:efff:fe89:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::2caf:fcff:fe07:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::ac43:e1ff:fe44:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::1c3f:ccff:fe94:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::2ca6:27ff:feac:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::dc16:3bff:fe2b:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::58e8:ddff:fe94:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::9c60:f7ff:fe12:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::d02d:a9ff:fe52:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::42:eaff:feb9:3:53 :::*
    1359/dnsmasq
udp6       0      0 fe80::c840:a8ff:fe6d:53 :::*
    1359/dnsmasq
udp6       0      0 ::1:53                  :::*
              1359/dnsmasq
udp6       0      0 fe80::f3:39ff:fe28:9:53 :::*
    1359/dnsmasq
```

Healthy Node (netstat)
```
*udp        0      0 127.0.0.1:53 <http://127.0.0.1:53>
0.0.0.0:*                           1400/dnsmasq*
*udp        0      0 10.3.1.79:53 <http://10.3.1.79:53>
0.0.0.0:*                           1400/dnsmasq*
*udp        0      0 172.17.0.1:53 <http://172.17.0.1:53>
 0.0.0.0:*                           1400/dnsmasq*
udp6       0      0 fe80::54d7:eff:fe7a::53 :::*
    1400/dnsmasq
udp6       0      0 fe80::6ccd:80ff:fe39:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::b870:1dff:fe4f:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::b8f9:20ff:fe9b:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::7416:9bff:fed7:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::94d7:29ff:fe2a:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::b4e1:a0ff:fe7e:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::84b8:20ff:feb2:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::603f:dbff:feec:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::284f:5fff:fe4f:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::100b:7aff:fe23:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::b088:7eff:fe20:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::4434:10ff:fe9e:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::f052:16ff:fe10:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::789b:19ff:fefd:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::c66:baff:fe98::53 :::*
    1400/dnsmasq
udp6       0      0 fe80::f08c:f7ff:fe7d:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::9898:dff:fe21::53 :::*
    1400/dnsmasq
udp6       0      0 fe80::3847:85ff:fe30:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::98cd:23ff:fe7f:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::c9b:6bff:fed6::53 :::*
    1400/dnsmasq
udp6       0      0 fe80::b002:f6ff:fe8a:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::8cf8:63ff:fe9a:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::4095:1eff:fe09:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::e8cf:10ff:fe32:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::fca4:12ff:fed7:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::d4e6:d4ff:fe7e:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::8c74:57ff:fe24:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::c0bb:f3ff:fe49:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::b462:beff:fe75:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::c816:f6ff:fec8:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::409f:6dff:feb8:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::30b8:caff:fec8:53 :::*
    1400/dnsmasq
udp6       0      0 fe80::42:12ff:fe82:5:53 :::*
    1400/dnsmasq
udp6       0      0 ::1:53                  :::*
             1400/dnsmasq
udp6       0      0 fe80::468:88ff:fe00::53 :::*
    1400/dnsmasq

```

The above 'udp' interfaces (not udp6) are actually the only ones I care
about - these are the ones that are mysteriously disappearing.

Failed Node Logs
```
/var/log/syslog.48:2018-02-03T13:36:23.540231+00:00 dnsmasq[1359]: failed
to create listening socket for fe80::fc42:b2ff:fe85:7702: No such device
/var/log/syslog.48:2018-02-03T13:36:23.540442+00:00 dnsmasq[1359]: failed
to create listening socket for fe80::fc42:b2ff:fe85:7702: No such device
/var/log/syslog.69:2018-02-02T16:56:16.907906+00:00 dnsmasq[1359]: failed
to create listening socket for fe80::dcb4:e5ff:fe94:9b31: No such device
/var/log/syslog.69:2018-02-02T16:56:16.908069+00:00 dnsmasq[1359]: failed
to create listening socket for fe80::dcb4:e5ff:fe94:9b31: No such device
/var/log/syslog.7:2018-02-05T06:05:31.066648+00:00 dnsmasq[1359]: failed to
create listening socket for 172.17.0.1: Address already in use
/var/log/syslog.7:2018-02-05T06:05:31.066813+00:00 dnsmasq[1359]: failed to
create listening socket for 10.3.4.228: Address already in use
/var/log/syslog.7:2018-02-05T06:05:31.066917+00:00 dnsmasq[1359]: failed to
create listening socket for 127.0.0.1: Address already in use
```

Hopefully this helps clarify what I'm seeing. Please let me know if you
need any additional information.

Thanks,
Zach


On Sun, Feb 4, 2018 at 10:13 AM, Simon Kelley <si...@thekelleys.org.uk>
wrote:

> You're using --bind-dynamic, so dnsmasq creates a set of listener
> sockets, one for each if the addresses on the machine's interfaces. When
> such an address is removed or created, dnsmasq gets an event via
> netlink, and it does two things,
>
> 1) Enumerates the current set of addresses on the machines interfaces.
> 2) Updates the set of listening sockets to reflect that set: creating
> new sockets for new addresses, and deleting sockets for addresses which
> are no longer in use.
>
> That process is an obvious potential source of the behaviour you're seeing.
>
> I'm slightly confused that your description talks about THE socket file:
> there should be one for each address possessed by the machine. To try
> and get a handle on what's happening, we need  to see what's happening
> to all the members of that set.
>
> "DNS resolutions sent to system[dnsmasq] (127.0.0.1:53) time out"
>
> implies that 127.0.1:53 the socket listening on 127.0.0.1 is going, but
> are there still sockets listening on port 53 for other addresses, or are
> all the UDP sockets going?
>
> There is a second set of sockets listening on the same addresses/ports
> for TCP connections. It would be interesting to see if the TCP sockets
> go as well, or is it's only the UDP socket that disappears?
>
> Is the network config of docker containers changed at any time? If so,
> forcing that is an obvious way of trying to reproduce this problem.
> Given that 127.0.0.1 is an address which dispappears, anything which
> fiddles with the lo interface is of particular suspicion.
>
>
>
> Cheers,
>
>
> Simon.
>
>
>
>
> On 31/01/18 21:12, Zi Dvbelju wrote:
> > *
> >
> > I’m experiencing an issue where allDNS resolutions sent to dnsmasq
> > timeout, but only after the dnsmasq service has been successfullyrunning
> > for a period of time (anecdotally, after a few weeks of time). After a
> > lot of digging, I’ve discovered that dnsmasq’s UDP socket file will
> > eventually “disappear”. The issue can be resolved by restarting the
> > dnsmasq service.
> >
> >
> > I haven’t been able to reproduce it yet, but it has happened numerous
> > times on servers which are running dozens of docker containers. From
> > what I know, nothing should be removing this socket file and I can’t
> > find anything relevant in the dnsmasq logs. Is anyone aware of any
> > situations that can cause socket files to disappear?
> >
> >
> > Environment
> >
> > Ubuntu 16.04.3 LTS
> >
> > 8 Cores, 16GB of RAM
> >
> > Dnsmasq 2.75-1ubuntu0.16.04.4
> >
> >
> > Background
> >
> > I’m using dnsmasq to forward requests to Consul
> > <https://www.consul.io/docs/guides/forwarding.html>, which is used for
> > service discovery. The Consul agent listens on port 8600 and is
> > configured to bind to allinterfaces (the relevant interface here is
> > 172.17.0.1, which docker creates).
> >
> >
> > Resolv.conf
> >
> > ```
> >
> > # Dynamic resolv.conf(5) file for glibc resolver(3) generated by
> > resolvconf(8)
> >
> > nameserver 127.0.0.1
> >
> > ```
> >
> >
> > Dnsmasq.conf
> >
> > ```
> >
> > server=/consul/172.17.0.1#8600 <http://172.17.0.1/#8600>
> >
> > server=/10.in-addr.arpa/172.17.0.1#8600 <http://172.17.0.1/#8600>
> >
> > bind-dynamic
> >
> > ```
> >
> >
> > Systemd config for Docker
> >
> > ```
> >
> > ExecStart=/usr/bin/dockerd --bip=172.17.0.1/24 <http://172.17.0.1/24>
> > --dns=172.17.0.1 -H fd://
> >
> > ```
> >
> > While investigating the servers in the broken state, I observed the
> > following:
> >
> >
> >   *
> >
> >     nslookup / dig DNS resolutions are timing out
> >
> >   *
> >
> >     Docker logs show containers are also timing out on DNS resolutions
> >
> >   *
> >
> >     Systemd reports that dnsmasq is still running, pid still exists
> >
> >   *
> >
> >     DNS resolutions sent directly to the consul agent (127.0.0.1:8600
> >     <http://127.0.0.1:8600/>) succeed
> >
> >   *
> >
> >     DNS resolutions sent to system[dnsmasq] (127.0.0.1:53
> >     <http://127.0.0.1:53/>) time out
> >
> >   *
> >
> >     IPV6 UDP (::1) resolutions sent to dnsmasq succeeded
> >
> >   *
> >
> >     Netstat shows that the IPV4 UDP socket file for dnsmasq is missing
> >
> >   *
> >
> >     No relevant messages in kernel log (specifically, no dnsmasq OOM
> >     kill events)
> >
> >   *
> >
> >     File descriptor usage for the entire server was normal
> >
> >   *
> >
> >     File descriptor usage for the individual dnsmasq process was normal
> >
> >   *
> >
> >     CPU, RAM, and storage all look good
> >
> >
> > Thanks in advance for any discussion at all - I've been really
> > struggling with this one for a while now.
> >
> >
> > Zach
> >
> > *
> >
> >
> > _______________________________________________
> > Dnsmasq-discuss mailing list
> > Dnsmasq-discuss@lists.thekelleys.org.uk
> > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> >
>
>
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss@lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>
_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss

Reply via email to