> On our production name servers we have check every 30s if bind > is alive by sending a SOA query to bind. Today I upgraded a few > nodes from 9.18.x (x between 17 and 27) to 9.20.1 (Ubuntu 24.04 > with packages from ISC ppa). > > Since that, we have sporadic timeouts (3s). On the nodes with > more qps we see it more often. > > Before I dig into the problem, are there any specific changes > to 9.20 that I should look at? Maybe some default value changes > for socket buffers, thread handling ...?
I can't answer specifically about BIND 9.20, I'm currently tipping my toes carefully into the waters of "deploying BIND 9.20 as a recursor". What you don't say anything about is whether you see increased CPU load on your hosts, and whether the relationship between QPS and CPU load has changed after upgrading to 9.20. Also, what general level of load do you observe on this / these host(s)? E.g. "how close to the limit of what it can do" are you? In our deployment, we monitor the relationship between the number of "udp: dropped due to full socket buffers" and "udp: datagrams received" (in our case via collectd / graphite / grafana), and when we started doing that we found out that we needed to bump the default UDP socket buffers quite a bit to get that event rate to go down to acceptable rates. Regrettably, as far as I know, BIND does not have a knob to adjust the socket buffer size for the UDP sockets BIND itself use, so what I ended up doing was bumping the default for UDP sockets the entire host via sysctl. In my case that's "fine" because the host is basically only serving this single function. Then again, I'm the weirdo running BIND on NetBSD, so the defaults are probably widely different in your case. Just an example from one of our publishing (non-recursive) BIND servers, from "netstat -s" output: udp: 1669688117 datagrams received 0 with incomplete header 10 with bad data length field 994 with bad checksum 10922 dropped due to no socket 874709 broadcast/multicast datagrams dropped due to no socket 890955 dropped due to full socket buffers 1667910527 delivered 2741883224 PCB hash misses 1632037948 datagrams output which comes out to 0.05% as an overall average "drops due to full socket buffers", but that doesn't mean there are occasional (smallish) spikes in the rate, of course. And this is with BIND 9.18.29. In other words: I think more information is needed to help you diagnose the issue. Regards, - Håvard -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users