On Fri, Apr 17, 2015 at 03:06:45PM -0700, Doug Barton wrote: > I have always believed (based on both the man pages, and what I've > seen in the field) that Unix stub resolvers follow the behavior > described in the man page. That is, they try the first 'nameserver' > address listed, and if it doesn't get a response before the timeout > value expires it then moves on to the next one in line. > > I was having a discussion with someone about that issue today who > insists that they have empirical evidence that this is not the case, > that they have seen stubs that round robin the addresses. So, I'm > wondering if y'all have seen the same thing?
It is configurable. See resolv.conf(5), specifically the "rotate" option. Unix stub resolvers are a mess--it is hopeless to rely on any kind of sane failover behavior with multiple nameservers listed in /etc/resolv.conf. Many servers/applications will hang if the first listed nameserver is down, or at least take so long to failover to the next nameserver that your service/application is effectively dead. Usually each new incoming request will start over at the first nameserver. Finally, most long-running processes won't bother to re-read /etc/resolv.conf if it changes, so even if you change the order during an outage (see [1]), it won't help. [1] http://kvz.io/blog/2013/03/27/poormans-way-to-decent-dns-failover/ "nsfailover" is a nice idea, but it doesn't work in practice for long-running server processes. It might be okay for desktop systems. The problem results from the fact that there is no system-wide state that is kept to maintain the status of each of the nameservers listed in /etc/resolv.conf. The C library keeps this state for each process and/or thread. If you have a server process that spawns a new thread or process for each incoming request, each process/thread will start over at the first nameserver and go through the timeout process until it finds a working nameserver. It may even be as bad as every new DNS request in the SAME process starts over from the first one. RES_TIMEOUT defaults to 5 seconds, and RES_DFLRETRY defaults to 2. So each DNS query could potentially hang for up to 10 seconds unless you have a really smart application that does the right thing and/or implements its own stub resolver. Windows doesn't have a this problem because it comes with a system-wide DNS cache by default. OS X I'm not sure about, it may also come with a cache. The Linux folks are working on solutions. One attempt is systemd-resolved. But you can't rely on such nice client-side solutions/behavior because most Unix systems are still broken out-of-the-box. As a DNS resolver operator for my campus, I've come to this unfortunate conclusion after months of research and testing. The only sane thing to do is: 1. Run a system-wide DNS caching resolver on 127.0.0.1, and point /etc/resolv.conf to that. or 2. Use anycast to make your multiple DNS servers appear as one IP, and put that one IP in /etc/resolv.conf. You can have multiple IPs, but each one should still be anycasted. I know my answer was way more than you asked for, but I had to take this chance to get the word out :-) _______________________________________________ dns-operations mailing list [email protected] https://lists.dns-oarc.net/mailman/listinfo/dns-operations dns-jobs mailing list https://lists.dns-oarc.net/mailman/listinfo/dns-jobs
