Well, all the prodding from people here prompted me to investigate further exactly what's going on. The problem isn't what I thought it was. It appears to be a bug in glibc, and I've filed a bug report and found a workaround.

In a nutshell, the getaddrinfo function in glibc sends both A and AAAA queries to the DNS server at the same time and then deals with the responses as they come in. Unfortunately, if the responses to the two queries come back in reverse order, /and/ the first one to come back is a server failure, both of which are the case when you try to resolve en.wikipedia.org immediately after restarting your DNS server so nothing is cached, the glibc code screws up and decides it didn't get back a successful response even though it did.

If you do the same lookup again, it works, because the CNAME that was sent in response to the A query is cached, so both the A and AAAA queries get back valid responses from the DNS server. And even if that weren't the case, since the CNAME is cached it gets returned first, since the server doesn't need to do a query to get it, whereas it does need to do another query to get the AAAA record (which recall isn't being cached because of the previously discussed FORMERR problem). It'll keep working until the cached records time out, at which point it'll happen again, and then be OK again until the records time out, etc.

The workaround is to put "options single-request" in /etc/resolv.conf to prevent the glibc innards from sending out both the A and AAAA queries at the same time.

FYI, here's the glibc bug I filed about this:

http://sourceware.org/bugzilla/show_bug.cgi?id=12994

Thank you for telling me I was full of it and making me dig deeper into this until I located the actual cause of the issue. :-)

  jik

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to