and fun fact from my own experience.
I used to run load balancer on FreeBSD with OpenSSL built from ports.
somehow I chose "assembler optimization" to "no" and OpenSSL big numbers
arith were implemented in slow way

I was able to find big fraction of BN-functions using "perf" tool.
something like 25% of general impact

later, I used "openssl speed", I compared Linux <--> FreeBSD (on required
cipher suites)

How can I interpret openssl speed output? - Stack Overflow
<https://stackoverflow.com/questions/17410270/how-can-i-interpret-openssl-speed-output>

пн, 23 янв. 2023 г. в 14:11, Илья Шипицин <chipits...@gmail.com>:

> I would start with big picture view
>
> 1) are CPUs utilized at 100% ?
> 2) what is CPU usage in details - fraction of system, user, idle ... ?
>
> it will allow us to narrow things and find what is the bottleneck, either
> kernel space or user space.
>
> пн, 23 янв. 2023 г. в 14:01, Willy Tarreau <w...@1wt.eu>:
>
>> Hi Marc,
>>
>> On Mon, Jan 23, 2023 at 12:13:13AM -0600, Marc West wrote:
>> (...)
>> > I understand that raw performance on OpenBSD is sometimes not as high as
>> > other OSes in some scenarios, but the difference of 500 vs 10,000+
>> > req/sec and 1100 vs 40,000 connections here is very large so I wanted to
>> > see if there are any thoughts, known issues, or tunables that could
>> > possibly help improve HAProxy throughput on OpenBSD?
>>
>> Based on my experience a long time ago (~13-14 years), I remember that
>> PF's connection tracking didn't scale at all with the number of
>> connections. It was very clear that there was a very high per-packet
>> lookup cost indicating that a hash table was too small. Unfortunately
>> I didn't know how to change such settings, and since my home machine
>> was being an ADSL line anyway, the line would have been filled long
>> before the hash table so I didn't really care. But I was a bit shocked
>> by this observation. I supposed that since then it has significantly
>> evolved, but it would be worth having a look around this.
>>
>> > The usual OS tunables openfiles-cur/openfiles-max are raised to 200k,
>> > kern.maxfiles=205000 (openfiles peaked at 15k), and haproxy stats
>> > reports those as expected. PF state limit is raised to 1 million and
>> > peaked at 72k in use. BIOS power profile is set to max performance.
>>
>> I think you should try to flood the machine using UDP traffic to see
>> the difference between the part that happens in the network stack and
>> the part that happens in the rest of the system (haproxy included). If
>> a small UDP flood on accepted ports brings the machine on its knees,
>> it's definitely related to the network stack and/or filtering/tracking.
>> If it does nothing to it, I would tend to say that the lower network
>> layers and PF are innocent. This would leave us with TCP and haproxy.
>> A SYN flood test could be useful, maybe the listening queues are too
>> small and incoming packets are dropped too fast.
>>
>> At the TCP layer, a long time ago OpenBSD used to be a bit extremist
>> in the way it produces random sequence numbers. I don't know how it
>> is today nor if this has a significant cost. Similarly, outgoing
>> connections will need a random source port, and this can be expensive,
>> particularly when the number of concurrent connections raises and ports
>> become scarce, though you said that even blocked traffic causes harm
>> to the machine, so I doubt this is your concern for now.
>>
>> > pid = 78180 (process #1, nbproc = 1, nbthread = 32)
>> > uptime = 1d 19h10m11s
>> > system limits: memmax = unlimited; ulimit-n = 200000
>> > maxsock = 200000; maxconn = 99904; maxpipes = 0
>> >
>> > No errors that I can see in logs about hitting any limits. There is no
>> > change in results with http vs https, http/1.1 vs h2, with or without
>> > httplog, or reducing nbthread on this 40 core machine. If there are any
>> > other details I can provide please let me know.
>>
>> At least I'm seeing you're using kqueue, which is a good point.
>>
>> >   source  0.0.0.0 usesrc clientip
>>
>> I don't know if it's on-purpose that you're using transparent proxying
>> to the servers, but it's very likely that it will increase the processing
>> cost at the lower layers by creating extra states in the network sessions
>> table. Again this will only have an effect for traffic between haproxy and
>> the servers.
>>
>> > listen test_https
>> >   bind ip.ip.ip.ip:443 ssl crt /path/to/cert.pem no-tlsv11 alpn
>> h2,http/1.1
>>
>> One thing you can try here is to duplicate that line to have multiple
>> listening sockets (or just append "shards X" to specify the number of
>> sockets you want). One of the benefits is that it will multiply the
>> number of listening sockets hence increase the global queue size. Maybe
>> some of your packets are lost in socket queues and this could improve
>> the situation.
>>
>> I don't know if you have something roughly equivalent to "perf" on
>> OpenBSD nowadays, as that could prove extremely useful to figure where
>> the CPU time is spent. Other than that I'm a bit out of ideas.
>>
>> Willy
>>
>>

Reply via email to