Hi Michael,

Thanks for your comments, inline answers below.

> On 6 Mar 2020, at 05:42, Michael Van Der Beek <[email protected]> wrote:
> 
> Hi Fredrik,
> 
> Have you noticed this setting on dnsdist.
> setUDPTimeout(num)

Yes, I did, but I didn’t play around with that before I sent the email to the 
mailing list

> Set the maximum time dnsdist will wait for a response from a backend over 
> UDP, in seconds. Defaults to 2
> I'm not sure if timeouts are classified as drops. My guess probably, because 
> it didn't get a response in time.

Yes they are.

> Since your backend is a recursor. There are times that the recursor cannot 
> reach or encounters a non-responsive authoritative server.  Unbound has an 
> exponential backoff when querying such servers. I think it starts with 10s.
> https://nlnetlabs.nl/documentation/unbound/info-timeout/
> 
> I would suggest you set the dnsdist setUDPTImeout(10), frankly, if Unbound 
> cannot respond to you in < 10 seconds, most likely the target authoritative 
> server is not responding.

Good point, while I didn’t turn to the unbound documentation (thanks for the 
pointer) I played around with the UDPTimeout setting yesterday, 
first increasing to setUDPTImeout(5), which yielded better results in terms of 
Drops (and increased the latency) and then later to 15, just to be sure that 
unbound really should be done with queries, and noticed that the Drops became a 
lot less (and latency increase again). But as you suggest, setUDPTImeout(10) is 
probably the ultimate setting.  

> showServers()
#   Name                 Address                       State     Qps    Qlim 
Ord Wt    Queries   Drops Drate   Lat Outstanding Pools
0   worker1              127.0.0.1:53                     up    20.9       0   
1  1    1527539    6814   0.0  65.4           2
1   worker2              127.0.0.1:53                     up    21.9       0   
1  1    1553971    6910   2.0  59.4           3
2   worker3              [::1]:53                         up    15.9       0   
1  1    1528862    6793   1.0  51.1           3
3   worker4              [::1]:53                         up    54.7       0   
1  1    1523692    6880   1.0  53.3           3

Drops are now well below 1% 
So, I think we have the main answer there, UDP timeout discrepancy (vs resolver 
workloads)

> As to why one server has more drops then others.. 
> Assuming both servers have approximately the same number of queries/s
> So if the two servers have the same config (for unbound) and hardware.
> Note if the two servers are going via different ISPs then, their relative 
> network speed can cause difference in response times.
> Then I would suggest, look at the some of these settings to see if they are 
> the same.
> Note these are centos 7 settings. I'm not sure what the Debian equivalents 
> are.
> net.core.rmem_default
> net.core.wmem_default
> net.core.rmem_max
> net.core.wmem_max

Right, those where already increased on all the systems prior to dnsdist 
(as per https://nlnetlabs.nl/documentation/unbound/howto-optimise/)

> net.netfilter.nf_conntrack_udp_timeout
> 
> Also generally, turn off connection tracking for udp/tcp packets via your 
> firewall rules.
> https://kb.isc.org/docs/aa-01183

Yes, It’s already turned off via rules.
But I also turned off the FWs completely in my tests too, with no noticeable 
difference in Drops, (but I didn’t measure if there was any difference in 
latency though)

> Regards,
> 
> Michael
> 
> -----Original Message-----
> From: dnsdist <[email protected]> On Behalf Of Fredrik 
> Pettai via dnsdist
> Sent: Thursday, March 5, 2020 6:14 PM
> To: [email protected]
> Subject: [dnsdist] dnsdist Drops, revisited
> 
> Hi list,
> 
> I’m curious on the “high" amount of Drops I see on one dnsdist 1.4.0 (debian 
> derived packages) frontend compared to other(s) And I’m guessing the main 
> reason is workload, which is different (services/servers use this resolver 
> that Drops more).
> 
> I don’t find the “high” Drops numbers satisfying, but perhaps these numbers 
> are about normal average? 
> Anyway, I'd would like to improve those numbers if possible. Here are some 
> stats from two dnsdist frontends:
> 
>> showServers()
> #   Name                 Address                       State     Qps    Qlim 
> Ord Wt    Queries   Drops Drate   Lat Outstanding Pools
> 0   worker1              127.0.0.1:53                     up    73.7       0  
>  1  1     565950     278   0.0   0.5           0
> 1   worker2              [::1]:53                         up    55.7       0  
>  1  1     584273     294   0.0   1.1           0
> 
> While one of our bigger servers doesn’t perform as well (in terms of Drops 
> ratio):
> 
>> showServers()
> #   Name                 Address                       State     Qps    Qlim 
> Ord Wt    Queries   Drops Drate   Lat Outstanding Pools
> 0   worker1              127.0.0.1:53                     up    43.8       0  
>  1  1    1054047   12728   0.0  31.1           4
> 1   worker2              127.0.0.1:53                     up    43.8       0  
>  1  1    1064823   12823   0.0  17.5           4
> 2   worker3              [::1]:53                         up    20.9       0  
>  1  1    1054548   12773   0.0  38.5           2
> 3   worker4              [::1]:53                         up    35.8       0  
>  1  1    1081502   12854   0.0  48.9           3
> 
> FW & DNSdist rules are almost none, and the same configuration on both the 
> above systems (actually more active rules and even Lua-code on the “fast” 
> dnsdist-system)
> 
> I just found one earlier thread on the topic, and it didn’t describe a way to 
> improve the situation, just how to possibly look to see what the underlying 
> issues might be...
> 
> http://powerdns.13854.n7.nabble.com/dnsdist-drops-packet-td11974.html
> (https://mailman.powerdns.com/pipermail/dnsdist/2016-January/000052.html)
> 
> dumpStats from the above server
> 
>> dumpStats()
> acl-drops                               0    latency0-1                   
> 3620405
> cache-hits                              0    latency1-10                    
> 59808
> cache-misses                            0    latency10-50                  
> 132513
> cpu-sys-msec                       749565    latency100-1000               
> 386909
> cpu-user-msec                      470696    latency50-100                 
> 101861
> downstream-send-errors                  0    no-policy                        
>   0
> downstream-timeouts                 52571    noncompliant-queries             
>   0
> dyn-block-nmg-size                      0    noncompliant-responses           
>   0
> dyn-blocked                             0    queries                      
> 4382032
> empty-queries                           0    rdqueries                    
> 4382007
> fd-usage                               42    real-memory-usage          
> 315129856
> frontend-noerror                  3254422    responses                    
> 4329454
> frontend-nxdomain                  902996    rule-drop                        
>   0
> frontend-servfail                  172012    rule-nxdomain                    
>   0
> latency-avg100                      41936.3  rule-refused                     
>   0
> latency-avg1000                     44165.7  rule-servfail                    
>   0
> latency-avg10000                    43366.6  security-status                  
>   0
> latency-avg1000000                  41994.4  self-answered                    
>   1
> latency-count                     4329455    servfail-responses            
> 172012
> latency-slow                        27681    special-memory-usage        
> 95940608
> latency-sum                     172860695    trunc-failures                   
>   0
> 
>> topSlow(10, 1000)
>   1  uyrg.com.                                  69 46.9%
>   2  115.61.96.156.in-addr.arpa.                19 12.9%
>   3  nhu.edu.tw.                                 9  6.1%
>   4  nbkailan.com.                               8  5.4%
>   5  aikesi.com.                                 8  5.4%
>   6  168.122.238.45.in-addr.arpa.                6  4.1%
>   7  45-179-252-62-dynamic.proxyar.com.          4  2.7%
>   8  callforarticle.com.                         3  2.0%
>   9  default._domainkey.nhu.edu.tw.              3  2.0%
>  10  205.78.127.180.in-addr.arpa.                3  2.0%
>  11  Rest                                       15 10.2%
> 
> (Many are probably spammy relay IPs, sending domains, etc) 
> 
> Is there a way to optimise the dnsdist configuration, for instance making a 
> slow path?
> either for the slow queries, or possibly the clients that ask those queries?
> 
> (Also, It’s unbound in the backend of all dnsdist frontend, and it’s caching 
> heavily, also expired answers).
> 
> Re,
> /P
> 
> 
> 
> 
> 
> _______________________________________________
> dnsdist mailing list
> [email protected]
> https://mailman.powerdns.com/mailman/listinfo/dnsdist

_______________________________________________
dnsdist mailing list
[email protected]
https://mailman.powerdns.com/mailman/listinfo/dnsdist

Reply via email to