and fun fact from my own experience. I used to run load balancer on FreeBSD with OpenSSL built from ports. somehow I chose "assembler optimization" to "no" and OpenSSL big numbers arith were implemented in slow way
I was able to find big fraction of BN-functions using "perf" tool. something like 25% of general impact later, I used "openssl speed", I compared Linux <--> FreeBSD (on required cipher suites) How can I interpret openssl speed output? - Stack Overflow <https://stackoverflow.com/questions/17410270/how-can-i-interpret-openssl-speed-output> пн, 23 янв. 2023 г. в 14:11, Илья Шипицин <chipits...@gmail.com>: > I would start with big picture view > > 1) are CPUs utilized at 100% ? > 2) what is CPU usage in details - fraction of system, user, idle ... ? > > it will allow us to narrow things and find what is the bottleneck, either > kernel space or user space. > > пн, 23 янв. 2023 г. в 14:01, Willy Tarreau <w...@1wt.eu>: > >> Hi Marc, >> >> On Mon, Jan 23, 2023 at 12:13:13AM -0600, Marc West wrote: >> (...) >> > I understand that raw performance on OpenBSD is sometimes not as high as >> > other OSes in some scenarios, but the difference of 500 vs 10,000+ >> > req/sec and 1100 vs 40,000 connections here is very large so I wanted to >> > see if there are any thoughts, known issues, or tunables that could >> > possibly help improve HAProxy throughput on OpenBSD? >> >> Based on my experience a long time ago (~13-14 years), I remember that >> PF's connection tracking didn't scale at all with the number of >> connections. It was very clear that there was a very high per-packet >> lookup cost indicating that a hash table was too small. Unfortunately >> I didn't know how to change such settings, and since my home machine >> was being an ADSL line anyway, the line would have been filled long >> before the hash table so I didn't really care. But I was a bit shocked >> by this observation. I supposed that since then it has significantly >> evolved, but it would be worth having a look around this. >> >> > The usual OS tunables openfiles-cur/openfiles-max are raised to 200k, >> > kern.maxfiles=205000 (openfiles peaked at 15k), and haproxy stats >> > reports those as expected. PF state limit is raised to 1 million and >> > peaked at 72k in use. BIOS power profile is set to max performance. >> >> I think you should try to flood the machine using UDP traffic to see >> the difference between the part that happens in the network stack and >> the part that happens in the rest of the system (haproxy included). If >> a small UDP flood on accepted ports brings the machine on its knees, >> it's definitely related to the network stack and/or filtering/tracking. >> If it does nothing to it, I would tend to say that the lower network >> layers and PF are innocent. This would leave us with TCP and haproxy. >> A SYN flood test could be useful, maybe the listening queues are too >> small and incoming packets are dropped too fast. >> >> At the TCP layer, a long time ago OpenBSD used to be a bit extremist >> in the way it produces random sequence numbers. I don't know how it >> is today nor if this has a significant cost. Similarly, outgoing >> connections will need a random source port, and this can be expensive, >> particularly when the number of concurrent connections raises and ports >> become scarce, though you said that even blocked traffic causes harm >> to the machine, so I doubt this is your concern for now. >> >> > pid = 78180 (process #1, nbproc = 1, nbthread = 32) >> > uptime = 1d 19h10m11s >> > system limits: memmax = unlimited; ulimit-n = 200000 >> > maxsock = 200000; maxconn = 99904; maxpipes = 0 >> > >> > No errors that I can see in logs about hitting any limits. There is no >> > change in results with http vs https, http/1.1 vs h2, with or without >> > httplog, or reducing nbthread on this 40 core machine. If there are any >> > other details I can provide please let me know. >> >> At least I'm seeing you're using kqueue, which is a good point. >> >> > source 0.0.0.0 usesrc clientip >> >> I don't know if it's on-purpose that you're using transparent proxying >> to the servers, but it's very likely that it will increase the processing >> cost at the lower layers by creating extra states in the network sessions >> table. Again this will only have an effect for traffic between haproxy and >> the servers. >> >> > listen test_https >> > bind ip.ip.ip.ip:443 ssl crt /path/to/cert.pem no-tlsv11 alpn >> h2,http/1.1 >> >> One thing you can try here is to duplicate that line to have multiple >> listening sockets (or just append "shards X" to specify the number of >> sockets you want). One of the benefits is that it will multiply the >> number of listening sockets hence increase the global queue size. Maybe >> some of your packets are lost in socket queues and this could improve >> the situation. >> >> I don't know if you have something roughly equivalent to "perf" on >> OpenBSD nowadays, as that could prove extremely useful to figure where >> the CPU time is spent. Other than that I'm a bit out of ideas. >> >> Willy >> >>