Running dnsmasq with these options:
/usr/sbin/dnsmasq -k --cache-size=50 --log-facility=- --user=nobody
--group=nobody --no-hosts --neg-ttl=60 --max-ttl=240 --max-cache-ttl=300

No local dnsmasq config file so that's literally all the config other than
defaults applied by dnsmasq

dnsmasq -v
Dnsmasq version 2.78  Copyright (c) 2000-2017 Simon Kelley
Compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6
no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify

This is something running running in Kubernetes, they run as sidekick
containers to the main application container. I have multiple of the same
deployment running in the cluster, so they're all at the same versions and
receiving equal amounts of traffic via load balancing. They all talk to the
same endpoints outbound and do the same work load etc. I've sent sigusr1
signal to all of the pods individually (all pods have been running for
approx 48 hours bar pod4 which has been running less hours):

I0608 15:11:34.998127       1 nanny.go:116] dnsmasq[19]: time 1528470694
I0608 15:11:34.998169       1 nanny.go:116] dnsmasq[19]: cache size 50,
0/2267416 cache insertions re-used unexpired cache entries.
I0608 15:11:34.998175       1 nanny.go:116] dnsmasq[19]: queries forwarded
3218560, queries answered locally 3182486
I0608 15:11:34.998180       1 nanny.go:116] dnsmasq[19]: queries for
authoritative zones 0
I0608 15:11:34.998184       1 nanny.go:116] dnsmasq[19]: server queries sent 3218560, retried or failed 16

I0608 15:11:35.909168       1 nanny.go:116] dnsmasq[18]: time 1528470695
I0608 15:11:35.909206       1 nanny.go:116] dnsmasq[18]: cache size 50,
0/197465 cache insertions re-used unexpired cache entries.
I0608 15:11:35.909211       1 nanny.go:116] dnsmasq[18]: queries forwarded
240843, queries answered locally 6159789
I0608 15:11:35.909216       1 nanny.go:116] dnsmasq[18]: queries for
authoritative zones 0
I0608 15:11:35.909219       1 nanny.go:116] dnsmasq[18]: server queries sent 240843, retried or failed 4

I0608 15:11:36.948015       1 nanny.go:116] dnsmasq[20]: time 1528470696
I0608 15:11:36.948083       1 nanny.go:116] dnsmasq[20]: cache size 50,
0/63648 cache insertions re-used unexpired cache entries.
I0608 15:11:36.948138       1 nanny.go:116] dnsmasq[20]: queries forwarded
46004, queries answered locally 6347223
I0608 15:11:36.948188       1 nanny.go:116] dnsmasq[20]: queries for
authoritative zones 0
I0608 15:11:36.948219       1 nanny.go:116] dnsmasq[20]: server queries sent 46004, retried or failed 1

I0608 15:11:38.032330       1 nanny.go:116] dnsmasq[24]: time 1528470698
I0608 15:11:38.032374       1 nanny.go:116] dnsmasq[24]: cache size 50,
0/1359727 cache insertions re-used unexpired cache entries.
I0608 15:11:38.032382       1 nanny.go:116] dnsmasq[24]: queries forwarded
1939395, queries answered locally 742411
I0608 15:11:38.032388       1 nanny.go:116] dnsmasq[24]: queries for
authoritative zones 0
I0608 15:11:38.032394       1 nanny.go:116] dnsmasq[24]: server queries sent 1939395, retried or failed 7

The problem I notice, is some pods (pod1, pod2, pod4) are forwarding far
more requests than the other pods (an indication of what other is, would be
pod3). I'm not sure what's caused this seeing as the application is doing
the same across all pods. The only thing I notice here, is that pods 1/2/4
all have a number of "retried or failed", which the other pods don't.
Therefore I wonder if the reason that those pods have sent so many more
requests upstream instead of hitting the cache, is because of something to
do with a "retried or failed", which then stops the cache from working for
a decent period of time? I've not been able to find anything (google)
relating to this scenario but it's the only thing that makes sense to me
right now. I can accept a couple of failures for lookup here and there, but
one failure (if i'm onto something that is), seems to then cause no cache
hits for a large period of time?

Thanks and Best,
Dnsmasq-discuss mailing list

Reply via email to