On 25/09/14 16:49, Chris West wrote: > Yes, the lxc scripts are starting dnsmasq to a) provide dhcp and name > resolution for the vms, and b) proxy vms' DNS traffic to the host's dns > servers. The full command (without config file) is below. > > --log-queries has got me further. It appears that dnsmasq is caching the > AAAA NXDOMAIN for the hostname past when the A record exists, such as when > the machine booted after dnsmasq started.
Fabulous, that's really good information. > > Is this expected behaviour? Expected, yes, optimum, possibly not. To be clear, the query for AAAA test-foo is being forwarded to 213.186.33.99, and 213.186.33.99 is replying NXDOMAIN, with some, non-zero, time-to-live. This means that dnsmasq will continue to reply with NXDOMAIN for that query for that time-to-live, even if dnsmasq subsequently discovers that the test-foo has started to exist as a domain, because of the creation of a DHCPv4 lease. You could certainly argue that dnsmasq should change it's answer to NODATA once the lease exists, but that's not a complete answer. The first answer might be cached by the recipent. The real problem here is that you're asking 213.186.33.99 about domains which it has no way of answering correctly, and it's giving answers with non-zero time-to-live values that are polluting the view of the DNS. The solution is quite easy, you can tell dnsmasq not to forward such "bare" queries with --domain-needed That should change the reply to the first AAAA query to NXDOMAIN with a zero TTL, and the reply to the subsequent AAAA query to NODATA, because the A record now exists. Cheers, Simon. > > Example logs; annotations surrounded by "****": > > ubuntu@wolf ~$ sudo dnsmasq -u lxc-dnsmasq --strict-order --bind-interfaces > --pid-file=/run/lxc/dnsmasq.pid --conf-file= --listen-address 10.0.3.1 > --dhcp-range 10.0.3.2,10.0.3.254 --dhcp-lease-m=253 --dhcp-no-override > --except-interface=lo --interface=lxcbr0 > --dhcp-leasefile=/var/lib/misc/dnsmasq.lxcbr0.leases --dhcp-authoritative > --log-queries -k --log-facility=/dev/stdout > > Sep 25 16:40:45 dnsmasq[24567]: started, version 2.72 cachesize 150 > Sep 25 16:40:45 dnsmasq[24567]: compile time options: IPv6 GNU-getopt DBus > i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ipset auth DNSSEC loop-detect > Sep 25 16:40:45 dnsmasq-dhcp[24567]: DHCP, IP range 10.0.3.2 -- 10.0.3.254, > lease time 1h > Sep 25 16:40:45 dnsmasq-dhcp[24567]: DHCP, sockets bound exclusively to > interface lxcbr0 > Sep 25 16:40:45 dnsmasq[24567]: reading /etc/resolv.conf > Sep 25 16:40:45 dnsmasq[24567]: ignoring nameserver 10.0.3.1 - local > interface > Sep 25 16:40:45 dnsmasq[24567]: using nameserver 213.186.33.99#53 > Sep 25 16:40:45 dnsmasq[24567]: read /etc/hosts - 9 addresses > > **** poison the caches with some dig -t A test-foo; dig -t AAAA test-foo: > **** > > Sep 25 16:40:54 dnsmasq[24567]: query[A] test-foo from 10.0.3.1 > Sep 25 16:40:54 dnsmasq[24567]: forwarded test-foo to 213.186.33.99 > Sep 25 16:40:54 dnsmasq[24567]: reply test-foo is NXDOMAIN > Sep 25 16:41:00 dnsmasq[24567]: query[AAAA] test-foo from 10.0.3.1 > Sep 25 16:41:00 dnsmasq[24567]: forwarded test-foo to 213.186.33.99 > Sep 25 16:41:00 dnsmasq[24567]: reply test-foo is NXDOMAIN > > **** here I start the machine test-foo, and it gets its dhcp lease: **** > > Sep 25 16:41:08 dnsmasq-dhcp[24567]: DHCPDISCOVER(lxcbr0) 10.0.3.123 > 9a:8b:50:fb:4c:a0 > Sep 25 16:41:08 dnsmasq-dhcp[24567]: DHCPOFFER(lxcbr0) 10.0.3.123 > 9a:8b:50:fb:4c:a0 > Sep 25 16:41:08 dnsmasq-dhcp[24567]: DHCPREQUEST(lxcbr0) 10.0.3.123 > 9a:8b:50:fb:4c:a0 > Sep 25 16:41:08 dnsmasq-dhcp[24567]: DHCPACK(lxcbr0) 10.0.3.123 > 9a:8b:50:fb:4c:a0 test-foo > > **** and does some bootup nonsense: **** > > Sep 25 16:41:08 dnsmasq[24567]: query[A] ntp.ubuntu.com from 10.0.3.123 > Sep 25 16:41:08 dnsmasq[24567]: forwarded ntp.ubuntu.com to 213.186.33.99 > Sep 25 16:41:08 dnsmasq[24567]: query[AAAA] ntp.ubuntu.com from 10.0.3.123 > Sep 25 16:41:08 dnsmasq[24567]: forwarded ntp.ubuntu.com to 213.186.33.99 > Sep 25 16:41:08 dnsmasq[24567]: reply ntp.ubuntu.com is NODATA-IPv6 > Sep 25 16:41:08 dnsmasq[24567]: reply ntp.ubuntu.com is 91.189.89.199 > Sep 25 16:41:08 dnsmasq[24567]: reply ntp.ubuntu.com is 91.189.94.4 > > **** now I can re-run my dig -t A test-foo: **** > > Sep 25 16:41:15 dnsmasq[24567]: query[A] test-foo from 10.0.3.1 > Sep 25 16:41:15 dnsmasq[24567]: DHCP test-foo is 10.0.3.123 > > **** ... which works fine, and my dig -t AAAA test-foo: **** > > Sep 25 16:41:18 dnsmasq[24567]: query[AAAA] test-foo from 10.0.3.1 > Sep 25 16:41:18 dnsmasq[24567]: cached test-foo is NXDOMAIN > > ....and get the NXDOMAIN. > > > > If I restart dnsmasq after all the machines are started: > > Sep 25 16:46:08 dnsmasq[25559]: started, version 2.72 cachesize 150 > Sep 25 16:46:08 dnsmasq[25559]: compile time options: IPv6 GNU-getopt DBus > i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ipset auth DNSSEC loop-detect > Sep 25 16:46:08 dnsmasq-dhcp[25559]: DHCP, IP range 10.0.3.2 -- 10.0.3.254, > lease time 1h > Sep 25 16:46:08 dnsmasq-dhcp[25559]: DHCP, sockets bound exclusively to > interface lxcbr0 > Sep 25 16:46:08 dnsmasq[25559]: reading /etc/resolv.conf > Sep 25 16:46:08 dnsmasq[25559]: ignoring nameserver 10.0.3.1 - local > interface > Sep 25 16:46:08 dnsmasq[25559]: using nameserver 213.186.33.99#53 > Sep 25 16:46:08 dnsmasq[25559]: read /etc/hosts - 9 addresses > > **** querying returns NODATA (NOERROR in dig terminology), as I would > expect **** > > Sep 25 16:46:19 dnsmasq[25559]: query[AAAA] test-foo from 10.0.3.1 > Sep 25 16:46:19 dnsmasq[25559]: forwarded test-foo to 213.186.33.99 > Sep 25 16:46:19 dnsmasq[25559]: reply test-foo is NODATA-IPv6 > Sep 25 16:46:36 dnsmasq[25559]: query[A] test-foo from 10.0.3.1 > Sep 25 16:46:36 dnsmasq[25559]: DHCP test-foo is 10.0.3.123 > > End. > > On 24 September 2014 20:38, Simon Kelley <si...@thekelleys.org.uk> wrote: > >> The problem for people on this list is that we don't (or, at least, I >> don't) have any knowledge about lxc. If you can give us information >> about the dnsmasq configuration that's being generated by lxc, then we >> stand a better chance of diagnosing the problem. >> >> I'm assuming that the VMs are getting addresses by DHCP, and therefore >> the names test-dove and test-harp are names that derive from DHCP leases. >> >> Can you enable the dnsmasq option --log-queries, and find the logs >> associated with your test DNS queries? That should give us some >> information about where dnsmasq is getting the information from. >> >> >> Cheers, >> >> >> Simon. >> >> >> >> >> On 24/09/14 15:01, Chris West wrote: >>> I've re-tested this with the 2.72 release (I'm pretty sure!) and I'm >> still >>> seeing the same intermittent behaviour. >>> >>> On 23 September 2014 10:37, Chris West <chris.w...@logicalglue.com> >> wrote: >>> >>>> dnsmasq is being run by the Debian (well, Ubuntu) lxc scripts. I am >>>> proxying to lxc vms (by name) with nginx (so using the nginx built-in >>>> resolver, which seems more sensitive than normal resolvers). >>>> >>>> On an Ubuntu Trusty machine (dnsmasq 2.68), everything works fine. >>>> >>>> On an Ubuntu Utopic machine (dnsmasq 2.71), proxying always fails with >>>> "..foo could not be resolved (3: Host not found)". >>>> >>>> I thought for a while that this might have been: >>>> * 288df49 - Fix bug when resulted in NXDOMAIN answers instead of NODATA. >>>> (5 days ago) <Simon Kelley> >>>> >>>> ...so I rolled the Utopic machine back to the 2.68 package. (I'm not >>>> confident with building a replacement dnsmasq given how complex the >> debian >>>> LXC stuff is.) However, now this still fails intermittently, and I'm >> at a >>>> loss. >>>> >>>> Currently I have two running machines, named "test-dove" and >> "test-harp". >>>> "harp" was started after "dove". Both resolve fine for A records: >>>> ubuntu@wolf ~$ dig -t A test-dove @10.0.3.1 | egrep 'status:|IN' >>>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65076 >>>> ;test-dove. IN A >>>> test-dove. 0 IN A 10.0.3.168 >>>> ubuntu@wolf ~$ dig -t A test-harp @10.0.3.1 | egrep 'status:|IN' >>>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35736 >>>> test-harp. 0 IN A 10.0.3.34 >>>> >>>> However, only "dove" gets a correct answer for AAAA records: >>>> ubuntu@wolf ~$ dig -t AAAA test-dove @10.0.3.1 | egrep 'status:|IN' >>>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57038 >>>> ;test-dove. IN AAAA >>>> ubuntu@wolf ~$ dig -t AAAA test-harp @10.0.3.1 | egrep 'status:|IN' >>>> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 14476 >>>> ;test-harp. IN AAAA >>>> >>>> Is this likely to be fixed by that patch, or can anyone else guess >> what's >>>> up with the system? >>>> >>>> >>> >>> >>> >>> _______________________________________________ >>> Dnsmasq-discuss mailing list >>> Dnsmasq-discuss@lists.thekelleys.org.uk >>> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss >>> >> >> >> _______________________________________________ >> Dnsmasq-discuss mailing list >> Dnsmasq-discuss@lists.thekelleys.org.uk >> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss >> > _______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss