also, I wonder what is LibreSSL <--> OpenSSL perf. I'll try "openssl speed" (I recall LibreSSL has the same feature), but I'm not sure I can get OpenBSD machine.
can you try haproxy + openssl-1.1.1 (it is considered the most performant these days) ? пн, 23 янв. 2023 г. в 14:17, Илья Шипицин <chipits...@gmail.com>: > and fun fact from my own experience. > I used to run load balancer on FreeBSD with OpenSSL built from ports. > somehow I chose "assembler optimization" to "no" and OpenSSL big numbers > arith were implemented in slow way > > I was able to find big fraction of BN-functions using "perf" tool. > something like 25% of general impact > > later, I used "openssl speed", I compared Linux <--> FreeBSD (on required > cipher suites) > > How can I interpret openssl speed output? - Stack Overflow > <https://stackoverflow.com/questions/17410270/how-can-i-interpret-openssl-speed-output> > > пн, 23 янв. 2023 г. в 14:11, Илья Шипицин <chipits...@gmail.com>: > >> I would start with big picture view >> >> 1) are CPUs utilized at 100% ? >> 2) what is CPU usage in details - fraction of system, user, idle ... ? >> >> it will allow us to narrow things and find what is the bottleneck, either >> kernel space or user space. >> >> пн, 23 янв. 2023 г. в 14:01, Willy Tarreau <w...@1wt.eu>: >> >>> Hi Marc, >>> >>> On Mon, Jan 23, 2023 at 12:13:13AM -0600, Marc West wrote: >>> (...) >>> > I understand that raw performance on OpenBSD is sometimes not as high >>> as >>> > other OSes in some scenarios, but the difference of 500 vs 10,000+ >>> > req/sec and 1100 vs 40,000 connections here is very large so I wanted >>> to >>> > see if there are any thoughts, known issues, or tunables that could >>> > possibly help improve HAProxy throughput on OpenBSD? >>> >>> Based on my experience a long time ago (~13-14 years), I remember that >>> PF's connection tracking didn't scale at all with the number of >>> connections. It was very clear that there was a very high per-packet >>> lookup cost indicating that a hash table was too small. Unfortunately >>> I didn't know how to change such settings, and since my home machine >>> was being an ADSL line anyway, the line would have been filled long >>> before the hash table so I didn't really care. But I was a bit shocked >>> by this observation. I supposed that since then it has significantly >>> evolved, but it would be worth having a look around this. >>> >>> > The usual OS tunables openfiles-cur/openfiles-max are raised to 200k, >>> > kern.maxfiles=205000 (openfiles peaked at 15k), and haproxy stats >>> > reports those as expected. PF state limit is raised to 1 million and >>> > peaked at 72k in use. BIOS power profile is set to max performance. >>> >>> I think you should try to flood the machine using UDP traffic to see >>> the difference between the part that happens in the network stack and >>> the part that happens in the rest of the system (haproxy included). If >>> a small UDP flood on accepted ports brings the machine on its knees, >>> it's definitely related to the network stack and/or filtering/tracking. >>> If it does nothing to it, I would tend to say that the lower network >>> layers and PF are innocent. This would leave us with TCP and haproxy. >>> A SYN flood test could be useful, maybe the listening queues are too >>> small and incoming packets are dropped too fast. >>> >>> At the TCP layer, a long time ago OpenBSD used to be a bit extremist >>> in the way it produces random sequence numbers. I don't know how it >>> is today nor if this has a significant cost. Similarly, outgoing >>> connections will need a random source port, and this can be expensive, >>> particularly when the number of concurrent connections raises and ports >>> become scarce, though you said that even blocked traffic causes harm >>> to the machine, so I doubt this is your concern for now. >>> >>> > pid = 78180 (process #1, nbproc = 1, nbthread = 32) >>> > uptime = 1d 19h10m11s >>> > system limits: memmax = unlimited; ulimit-n = 200000 >>> > maxsock = 200000; maxconn = 99904; maxpipes = 0 >>> > >>> > No errors that I can see in logs about hitting any limits. There is no >>> > change in results with http vs https, http/1.1 vs h2, with or without >>> > httplog, or reducing nbthread on this 40 core machine. If there are any >>> > other details I can provide please let me know. >>> >>> At least I'm seeing you're using kqueue, which is a good point. >>> >>> > source 0.0.0.0 usesrc clientip >>> >>> I don't know if it's on-purpose that you're using transparent proxying >>> to the servers, but it's very likely that it will increase the processing >>> cost at the lower layers by creating extra states in the network sessions >>> table. Again this will only have an effect for traffic between haproxy >>> and >>> the servers. >>> >>> > listen test_https >>> > bind ip.ip.ip.ip:443 ssl crt /path/to/cert.pem no-tlsv11 alpn >>> h2,http/1.1 >>> >>> One thing you can try here is to duplicate that line to have multiple >>> listening sockets (or just append "shards X" to specify the number of >>> sockets you want). One of the benefits is that it will multiply the >>> number of listening sockets hence increase the global queue size. Maybe >>> some of your packets are lost in socket queues and this could improve >>> the situation. >>> >>> I don't know if you have something roughly equivalent to "perf" on >>> OpenBSD nowadays, as that could prove extremely useful to figure where >>> the CPU time is spent. Other than that I'm a bit out of ideas. >>> >>> Willy >>> >>>