Hi Fredrik,

Have you noticed this setting on dnsdist.
setUDPTimeout(num)
Set the maximum time dnsdist will wait for a response from a backend over UDP, 
in seconds. Defaults to 2
I'm not sure if timeouts are classified as drops. My guess probably, because it 
didn't get a response in time.

Since your backend is a recursor. There are times that the recursor cannot 
reach or encounters a non-responsive authoritative server.  Unbound has an 
exponential backoff when querying such servers. I think it starts with 10s.
https://nlnetlabs.nl/documentation/unbound/info-timeout/

I would suggest you set the dnsdist setUDPTImeout(10), frankly, if Unbound 
cannot respond to you in < 10 seconds, most likely the target authoritative 
server is not responding.

As to why one server has more drops then others.. 
Assuming both servers have approximately the same number of queries/s
So if the two servers have the same config (for unbound) and hardware.
Note if the two servers are going via different ISPs then, their relative 
network speed can cause difference in response times.
Then I would suggest, look at the some of these settings to see if they are the 
same.
Note these are centos 7 settings. I'm not sure what the Debian equivalents are.
net.core.rmem_default
net.core.wmem_default
net.core.rmem_max
net.core.wmem_max
net.netfilter.nf_conntrack_udp_timeout

Also generally, turn off connection tracking for udp/tcp packets via your 
firewall rules.
 https://kb.isc.org/docs/aa-01183


Regards,

Michael

-----Original Message-----
From: dnsdist <[email protected]> On Behalf Of Fredrik 
Pettai via dnsdist
Sent: Thursday, March 5, 2020 6:14 PM
To: [email protected]
Subject: [dnsdist] dnsdist Drops, revisited

Hi list,

I’m curious on the “high" amount of Drops I see on one dnsdist 1.4.0 (debian 
derived packages) frontend compared to other(s) And I’m guessing the main 
reason is workload, which is different (services/servers use this resolver that 
Drops more).

I don’t find the “high” Drops numbers satisfying, but perhaps these numbers are 
about normal average? 
Anyway, I'd would like to improve those numbers if possible. Here are some 
stats from two dnsdist frontends:

> showServers()
#   Name                 Address                       State     Qps    Qlim 
Ord Wt    Queries   Drops Drate   Lat Outstanding Pools
0   worker1              127.0.0.1:53                     up    73.7       0   
1  1     565950     278   0.0   0.5           0
1   worker2              [::1]:53                         up    55.7       0   
1  1     584273     294   0.0   1.1           0

While one of our bigger servers doesn’t perform as well (in terms of Drops 
ratio):

> showServers()
#   Name                 Address                       State     Qps    Qlim 
Ord Wt    Queries   Drops Drate   Lat Outstanding Pools
0   worker1              127.0.0.1:53                     up    43.8       0   
1  1    1054047   12728   0.0  31.1           4
1   worker2              127.0.0.1:53                     up    43.8       0   
1  1    1064823   12823   0.0  17.5           4
2   worker3              [::1]:53                         up    20.9       0   
1  1    1054548   12773   0.0  38.5           2
3   worker4              [::1]:53                         up    35.8       0   
1  1    1081502   12854   0.0  48.9           3

FW & DNSdist rules are almost none, and the same configuration on both the 
above systems (actually more active rules and even Lua-code on the “fast” 
dnsdist-system)

I just found one earlier thread on the topic, and it didn’t describe a way to 
improve the situation, just how to possibly look to see what the underlying 
issues might be...

http://powerdns.13854.n7.nabble.com/dnsdist-drops-packet-td11974.html
(https://mailman.powerdns.com/pipermail/dnsdist/2016-January/000052.html)

dumpStats from the above server

> dumpStats()
acl-drops                         0    latency0-1                   3620405
cache-hits                        0    latency1-10                    59808
cache-misses                      0    latency10-50                  132513
cpu-sys-msec                 749565    latency100-1000               386909
cpu-user-msec                470696    latency50-100                 101861
downstream-send-errors            0    no-policy                          0
downstream-timeouts           52571    noncompliant-queries               0
dyn-block-nmg-size                0    noncompliant-responses             0
dyn-blocked                       0    queries                      4382032
empty-queries                     0    rdqueries                    4382007
fd-usage                         42    real-memory-usage          315129856
frontend-noerror            3254422    responses                    4329454
frontend-nxdomain            902996    rule-drop                          0
frontend-servfail            172012    rule-nxdomain                      0
latency-avg100                41936.3  rule-refused                       0
latency-avg1000               44165.7  rule-servfail                      0
latency-avg10000              43366.6  security-status                    0
latency-avg1000000            41994.4  self-answered                      1
latency-count               4329455    servfail-responses            172012
latency-slow                  27681    special-memory-usage        95940608
latency-sum               172860695    trunc-failures                     0

> topSlow(10, 1000)
   1  uyrg.com.                                  69 46.9%
   2  115.61.96.156.in-addr.arpa.                19 12.9%
   3  nhu.edu.tw.                                 9  6.1%
   4  nbkailan.com.                               8  5.4%
   5  aikesi.com.                                 8  5.4%
   6  168.122.238.45.in-addr.arpa.                6  4.1%
   7  45-179-252-62-dynamic.proxyar.com.          4  2.7%
   8  callforarticle.com.                         3  2.0%
   9  default._domainkey.nhu.edu.tw.              3  2.0%
  10  205.78.127.180.in-addr.arpa.                3  2.0%
  11  Rest                                       15 10.2%

(Many are probably spammy relay IPs, sending domains, etc) 

Is there a way to optimise the dnsdist configuration, for instance making a 
slow path?
either for the slow queries, or possibly the clients that ask those queries?

(Also, It’s unbound in the backend of all dnsdist frontend, and it’s caching 
heavily, also expired answers).

Re,
/P



 

_______________________________________________
dnsdist mailing list
[email protected]
https://mailman.powerdns.com/mailman/listinfo/dnsdist
_______________________________________________
dnsdist mailing list
[email protected]
https://mailman.powerdns.com/mailman/listinfo/dnsdist

Reply via email to