The trigger of the behavior I am observing is not putting the machine to sleep but shutting down the interface that connects the machine to the upstream DNS servers (which coincidentally happens when putting the machine to sleep) -- after I shut down the upstream interface of machine and re-enable it, daemon->servers remains NULL regardless of contents of /etc/resolv.conf. Sending SIGHUP, or touching /etc/resolv.conf, causes dnsmasq to re-read /etc/resolv.conf but daemon->servers remains NULL through all this.
I've added instrumentation all over the place because I don't know the code, and this is what I am seeing: External interface is disabled: Oct 20 10:39:39 chapilu dnsmasq[198384]: /etc/resolv.conf: # Generated by NetworkManager Oct 20 10:39:39 chapilu dnsmasq[198384]: cleanup_servers(): on entry daemon->servers = 0x563729effa20 Oct 20 10:39:39 chapilu dnsmasq[198384]: cleanup_servers(): on exit daemon->servers = (nil) Oct 20 10:39:39 chapilu dnsmasq[198384]: no servers found in /etc/resolv.conf, will retry Expected. Note daemon->servers = (nil) because there are no servers in resolv.conf. Then I enabled the interface: Oct 20 10:39:51 chapilu dnsmasq[198384]: /etc/resolv.conf: # Generated by NetworkManager Oct 20 10:39:51 chapilu dnsmasq[198384]: /etc/resolv.conf: search example.com Oct 20 10:39:51 chapilu dnsmasq[198384]: /etc/resolv.conf: nameserver 64.102.6.247 Oct 20 10:39:51 chapilu dnsmasq[198384]: reload_servers(): adding server 64.102.6.247 Oct 20 10:39:51 chapilu dnsmasq[198384]: reload_servers(): adding server via add_update_server() Oct 20 10:39:51 chapilu dnsmasq[198384]: add_update_server(): flags = 2048, daemon->servers = (nil) Oct 20 10:39:51 chapilu dnsmasq[198384]: add_update_server(): added server to tail. Oct 20 10:39:51 chapilu dnsmasq[198384]: reload_servers(): add_update_server() returned 1; daemon->servers = (nil) Oct 20 10:39:51 chapilu dnsmasq[198384]: /etc/resolv.conf: nameserver 173.37.137.85 Oct 20 10:39:51 chapilu dnsmasq[198384]: reload_servers(): adding server 173.37.137.85 Oct 20 10:39:51 chapilu dnsmasq[198384]: reload_servers(): adding server via add_update_server() Oct 20 10:39:51 chapilu dnsmasq[198384]: add_update_server(): flags = 2048, daemon->servers = (nil) Oct 20 10:39:51 chapilu dnsmasq[198384]: add_update_server(): added server to tail. Oct 20 10:39:51 chapilu dnsmasq[198384]: reload_servers(): add_update_server() returned 1; daemon->servers = (nil) Oct 20 10:39:51 chapilu dnsmasq[198384]: /etc/resolv.conf: nameserver 173.37.142.73 Oct 20 10:39:51 chapilu dnsmasq[198384]: reload_servers(): adding server 173.37.142.73 Oct 20 10:39:51 chapilu dnsmasq[198384]: reload_servers(): adding server via add_update_server() Oct 20 10:39:51 chapilu dnsmasq[198384]: add_update_server(): flags = 2048, daemon->servers = (nil) Oct 20 10:39:51 chapilu dnsmasq[198384]: add_update_server(): added server to tail. Oct 20 10:39:51 chapilu dnsmasq[198384]: reload_servers(): add_update_server() returned 1; daemon->servers = (nil) Oct 20 10:39:51 chapilu dnsmasq[198384]: cleanup_servers(): on entry daemon->servers = (nil) Oct 20 10:39:51 chapilu dnsmasq[198384]: cleanup_servers(): on exit daemon->servers = (nil) Oct 20 10:39:51 chapilu dnsmasq[198384]: reading /etc/resolv.conf Oct 20 10:39:51 chapilu dnsmasq[198384]: check_servers(): daemon->servers is NULL! Oct 20 10:39:51 chapilu dnsmasq[198384]: check_servers(): 0 servers in daemon->servers Oct 20 10:39:51 chapilu dnsmasq[198384]: check_servers(): daemon->local_domains is NULL! Oct 20 10:39:51 chapilu dnsmasq[198384]: check_servers(): 0 servers in daemon->local_domains Oct 20 10:39:51 chapilu dnsmasq[198384]: cleanup_servers(): on entry daemon->servers = (nil) Oct 20 10:39:51 chapilu dnsmasq[198384]: cleanup_servers(): on exit daemon->servers = (nil) So, there are valid servers to be added to the daemon->servers linked list, but upon return from add_update_server(), daemon->servers is NULL. That doesn't seem right. Looking at recent changes to add_update_server() I found: https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commitdiff;h=eb88eed1fc8ed246e9355531c2715fa2f7738afc I have reverted that commit and now bouncing the external interface does not cause add_update_server() to leave daemon->servers with NULL, and things work. I believe there is something wrong with the daemon->servers_tail logic introduced by the above commit. I'll try to determine what is wrong with the logic but please feel free to beat me to it because it might take me a little while. Cheers, Eloy Paris.- On Wed, Oct 20, 2021 at 05:35:47AM -0400, Eloy Paris wrote: > On Tue, Oct 19, 2021 at 01:13:46PM -0400, Eloy Paris wrote: > > > I am seeing the new issue happen very often -- the machine goes to sleep > > and when it comes back it seems like dnsmasq does not have upstream > > servers to forward requests to, so the virtual machine that relies on > > dnsmasq for DNS resolution cannot resolve anything. > > I've done some troubleshooting of this and my /etc/resolv.conf seems > stable when the machine comes back from sleep and dnsmasq reads it. > However, for some reason the servers there don't seem to be added to > the daemon->servers linked list. > > dnsmasq.c:poll_resolv() has: > > ---------------------------------------------------------------------- > if (latest) > { > static int warned = 0; > if (reload_servers(latest->name)) > { > my_syslog(LOG_INFO, _("reading %s"), latest->name); > warned = 0; > check_servers(0); > ---------------------------------------------------------------------- > > I instrumented check_servers(), as that is what logs "using nameserver > xyz", and my syslog has this in the working case (before I put the > machine to sleep): > > Oct 20 05:16:32 chapilu dnsmasq[167055]: /etc/resolv.conf: search example.com > Oct 20 05:16:32 chapilu dnsmasq[167055]: /etc/resolv.conf: nameserver 1.2.3.4 > Oct 20 05:16:32 chapilu dnsmasq[167055]: /etc/resolv.conf: nameserver 1.2.3.5 > Oct 20 05:16:32 chapilu dnsmasq[167055]: /etc/resolv.conf: nameserver 1.2.3.6 > Oct 20 05:16:32 chapilu dnsmasq[167055]: reading /etc/resolv.conf > Oct 20 05:16:32 chapilu dnsmasq[167055]: check_servers(): Server #1: domain = > , interface = > Oct 20 05:16:32 chapilu dnsmasq[167055]: using nameserver 1.2.3.4#53 > Oct 20 05:16:32 chapilu dnsmasq[167055]: check_servers(): Server #2: domain = > , interface = > Oct 20 05:16:32 chapilu dnsmasq[167055]: using nameserver 1.2.3.5#53 > Oct 20 05:16:32 chapilu dnsmasq[167055]: check_servers(): Server #3: domain = > , interface = > Oct 20 05:16:32 chapilu dnsmasq[167055]: using nameserver 1.2.3.6#53 > Oct 20 05:16:32 chapilu dnsmasq[167055]: check_servers(): 3 servers in > daemon->servers > Oct 20 05:16:32 chapilu dnsmasq[167055]: check_servers(): 0 servers in > daemon->local_domains > > (Doing "touch /etc/resolv.conf" when things are working [before I put > the machine to sleep], produces the above as well.) > > However, when I put the machine to sleep, and later resume, I get this: > > Oct 20 05:17:26 chapilu dnsmasq[167055]: /etc/resolv.conf: # Generated by > NetworkManager > Oct 20 05:17:26 chapilu dnsmasq[167055]: /etc/resolv.conf: search example.com > Oct 20 05:17:26 chapilu dnsmasq[167055]: /etc/resolv.conf: nameserver 1.2.3.4 > Oct 20 05:17:26 chapilu dnsmasq[167055]: /etc/resolv.conf: nameserver 1.2.3.5 > Oct 20 05:17:26 chapilu dnsmasq[167055]: /etc/resolv.conf: nameserver 1.2.3.6 > Oct 20 05:17:26 chapilu dnsmasq[167055]: reading /etc/resolv.conf > Oct 20 05:17:26 chapilu dnsmasq[167055]: check_servers(): 0 servers in > daemon->servers > Oct 20 05:17:26 chapilu dnsmasq[167055]: check_servers(): 0 servers in > daemon->local_domains > > So it seems like after the resume from sleep, /etc/resolv.conf is > stable, has the correct upstream DNS servers, but somehow > daemon->servers ends up with nothing; what could cause this? > > Cheers, > > Eloy Paris.- > _______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss