On what basis do you think that your clients are getting "connection refused". That's a specific ICMP error which typically originates in the kernel, because there's nothing actually listening on a port.
Also, are you using TCP connections? If not, there's not a connection, as such, the client sends a single UDP packet with the query, and some time later it, probably, gets a single UDP packet reply. The reply may never come because UDP is unreliable, either to and from dnsmasq, or to and from the upstream server it forwards too. Under heavy load, UDP packets may be dropped by the kernel too. Your error message ERROR: logging before flag.Parse: W0517 03:19:50.139060 1 server.go:53] Error getting metrics from dnsmasq: read udp 127.0.0.1:36181->127.0.0.1:53: i/o timeout implies that the client is timing out awaiting the UDP reply. Does it retry the query under those circumstances? If it doesn't, that's your problem, you're assuming UDP is reliable, when it ain't. Cheers, Simon. On 18/05/17 23:45, Guido Pepper wrote: > Hello. > We are running dnsmasq version > > /usr/sbin/dnsmasq --version > Dnsmasq version 2.76 Copyright (c) 2000-2016 Simon Kelley > Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP > DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect > inotify > > We run dnsmasq in our kubernetes (https://kubernetes.io/) clusters to > perform DNS resolution for the container based services running in the > cluster. I wrote up a bigger picture overview of our situation here > http://stackoverflow.com/q/44030167/6067470. > > The key points are that the applications running in our clusters > experience intermittent name resolution errors. At the same time that > 1 or more applications have a name resolution error we get connection > refused errors from an application that is querying dnsmaq for it's > metrics (eg: dig +short chaos txt cachesize.bind). I'm thinking that > the DNS failures we are seeing is that dnsmasq is refusing the > connection. I'm hoping someone can point me in a direction to get to > the root of these issues. The only thought I have is to run dnsmasq > in debug mode in the hopes that when connections are not being > accepted something will get logged that would be a clue as to why this > is happening. I'm wondering if that's a sound approach or if anyone > has alternate ideas for me to move this situation forward. > > Thanks for listening! > > _______________________________________________ > Dnsmasq-discuss mailing list > Dnsmasqemail@example.com > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss >
Description: OpenPGP digital signature
_______________________________________________ Dnsmasq-discuss mailing list Dnsmasqfirstname.lastname@example.org http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss