Re: [dnsdist] dnsdist 1.7.4 Debian Bullseye vs 1.8.4 Bullseye

Aleš Rygl via dnsdist Thu, 05 Oct 2023 01:42:11 -0700

Hi Remi,

On 02. 10. 23 13:53, Remi Gacogne via dnsdist wrote:

Hi Ales,
On 25/09/2023 16:09, Aleš Rygl via dnsdist wrote:
I would to kindly ask for help or and advice. I have justupgraded one of our dnsdist instances from 1.7.4 do 1.8.4 togetherwith OS upgrade (Debian 11.7 to 12.1). Everything works fine, noissues observed apart some deprecated config references. What is abig surprise to me is CPU usage. The newer version has nearly twotimes higher CPU consumption in userspace. I am nearly at 80% CPUwith 16 physical cores (was about 40%). We have a lot of TLS (DoT)sessions (30k) and 60kqps in total (30k via DoT) here. The latencymeasured by dnsdist went up also. We are collecting all the metricsdnsdist produces via graphite so I can check counters, what could bewrong.
Wow, that's awful. It's the first time I hear about such a regression,and I really would like to understand what is going on.1/ Are you using our packages, compiling yourself, or perhaps usingthe Debian ones?2/ Do you think it would be possible for you to try downgrading theinstance to 1.7.4 on Debian 12.1? It might help us pinpointing whetherthe issue is related to a system change (I have seen people complainabout the performance of OpenSSL 3.0.x compared to 1.1.1x, for example).
3/ Would you mind sharing your configuration?
4/ And finally, do you think it would be possible for you to collect aperf trace on this instance? It would require installing linux-perf,if possible the debug symbols for dnsdist (dnsdist-dbgsym) thenrunning 'perf record --call-graph dwarf -p <pid of running dnsdistprocess> -o </path/to/output/file>' for a few dozens of seconds tocollect a trace, stopping it with Ctrl+C and finally getting a reportwith "perf report -i </path/to/previous/file> --stdio". It should tellus where the CPU usage is going.
Best regards,

Thanks for your response. After some deep documentation reading andconfig tweaking I am nearly on the previous values regarding CPU load,apart from latency, which is still higher (1.3ms -> 2.3ms). I suspect adifferent way the latency is likely computed (I noticed a new set oflatency counters for TLS, TCP, etc.) here. The key configurationparameter is setMaxTCPClientThreads(). Changing anything else (cacheshards, number of listeners, etc.) has nearly no impact. We had 256 with1.7.4. now it is 16. Going up here means a rapid increase of CPU load,having less than 16 means dropping TCP connections in showTCPStats(),where Queued hits Max Queued. Insane values like 1024 kills the CPU. Wehave a physical server with 16 phys. cores, OS sees 32 cores.


Back to your questions:

1/ from your repos

2/ yes, I could try it, the thing is that 1.7.4 for Bullseye crashes onBookworm wit TLS enabled and there a no packages of 1.7.4 for Bookwormin your repo

3/ sure, I will do so
4/ no problem

Best regards

Ales





_______________________________________________
dnsdist mailing list
dnsdist@mailman.powerdns.com
https://mailman.powerdns.com/mailman/listinfo/dnsdist

Re: [dnsdist] dnsdist 1.7.4 Debian Bullseye vs 1.8.4 Bullseye

Reply via email to