also, I wonder what is LibreSSL <--> OpenSSL perf.
I'll try "openssl speed" (I recall LibreSSL has the same feature), but I'm
not sure I can get OpenBSD machine.

can you try haproxy + openssl-1.1.1 (it is considered the most performant
these days) ?

пн, 23 янв. 2023 г. в 14:17, Илья Шипицин <chipits...@gmail.com>:

> and fun fact from my own experience.
> I used to run load balancer on FreeBSD with OpenSSL built from ports.
> somehow I chose "assembler optimization" to "no" and OpenSSL big numbers
> arith were implemented in slow way
>
> I was able to find big fraction of BN-functions using "perf" tool.
> something like 25% of general impact
>
> later, I used "openssl speed", I compared Linux <--> FreeBSD (on required
> cipher suites)
>
> How can I interpret openssl speed output? - Stack Overflow
> <https://stackoverflow.com/questions/17410270/how-can-i-interpret-openssl-speed-output>
>
> пн, 23 янв. 2023 г. в 14:11, Илья Шипицин <chipits...@gmail.com>:
>
>> I would start with big picture view
>>
>> 1) are CPUs utilized at 100% ?
>> 2) what is CPU usage in details - fraction of system, user, idle ... ?
>>
>> it will allow us to narrow things and find what is the bottleneck, either
>> kernel space or user space.
>>
>> пн, 23 янв. 2023 г. в 14:01, Willy Tarreau <w...@1wt.eu>:
>>
>>> Hi Marc,
>>>
>>> On Mon, Jan 23, 2023 at 12:13:13AM -0600, Marc West wrote:
>>> (...)
>>> > I understand that raw performance on OpenBSD is sometimes not as high
>>> as
>>> > other OSes in some scenarios, but the difference of 500 vs 10,000+
>>> > req/sec and 1100 vs 40,000 connections here is very large so I wanted
>>> to
>>> > see if there are any thoughts, known issues, or tunables that could
>>> > possibly help improve HAProxy throughput on OpenBSD?
>>>
>>> Based on my experience a long time ago (~13-14 years), I remember that
>>> PF's connection tracking didn't scale at all with the number of
>>> connections. It was very clear that there was a very high per-packet
>>> lookup cost indicating that a hash table was too small. Unfortunately
>>> I didn't know how to change such settings, and since my home machine
>>> was being an ADSL line anyway, the line would have been filled long
>>> before the hash table so I didn't really care. But I was a bit shocked
>>> by this observation. I supposed that since then it has significantly
>>> evolved, but it would be worth having a look around this.
>>>
>>> > The usual OS tunables openfiles-cur/openfiles-max are raised to 200k,
>>> > kern.maxfiles=205000 (openfiles peaked at 15k), and haproxy stats
>>> > reports those as expected. PF state limit is raised to 1 million and
>>> > peaked at 72k in use. BIOS power profile is set to max performance.
>>>
>>> I think you should try to flood the machine using UDP traffic to see
>>> the difference between the part that happens in the network stack and
>>> the part that happens in the rest of the system (haproxy included). If
>>> a small UDP flood on accepted ports brings the machine on its knees,
>>> it's definitely related to the network stack and/or filtering/tracking.
>>> If it does nothing to it, I would tend to say that the lower network
>>> layers and PF are innocent. This would leave us with TCP and haproxy.
>>> A SYN flood test could be useful, maybe the listening queues are too
>>> small and incoming packets are dropped too fast.
>>>
>>> At the TCP layer, a long time ago OpenBSD used to be a bit extremist
>>> in the way it produces random sequence numbers. I don't know how it
>>> is today nor if this has a significant cost. Similarly, outgoing
>>> connections will need a random source port, and this can be expensive,
>>> particularly when the number of concurrent connections raises and ports
>>> become scarce, though you said that even blocked traffic causes harm
>>> to the machine, so I doubt this is your concern for now.
>>>
>>> > pid = 78180 (process #1, nbproc = 1, nbthread = 32)
>>> > uptime = 1d 19h10m11s
>>> > system limits: memmax = unlimited; ulimit-n = 200000
>>> > maxsock = 200000; maxconn = 99904; maxpipes = 0
>>> >
>>> > No errors that I can see in logs about hitting any limits. There is no
>>> > change in results with http vs https, http/1.1 vs h2, with or without
>>> > httplog, or reducing nbthread on this 40 core machine. If there are any
>>> > other details I can provide please let me know.
>>>
>>> At least I'm seeing you're using kqueue, which is a good point.
>>>
>>> >   source  0.0.0.0 usesrc clientip
>>>
>>> I don't know if it's on-purpose that you're using transparent proxying
>>> to the servers, but it's very likely that it will increase the processing
>>> cost at the lower layers by creating extra states in the network sessions
>>> table. Again this will only have an effect for traffic between haproxy
>>> and
>>> the servers.
>>>
>>> > listen test_https
>>> >   bind ip.ip.ip.ip:443 ssl crt /path/to/cert.pem no-tlsv11 alpn
>>> h2,http/1.1
>>>
>>> One thing you can try here is to duplicate that line to have multiple
>>> listening sockets (or just append "shards X" to specify the number of
>>> sockets you want). One of the benefits is that it will multiply the
>>> number of listening sockets hence increase the global queue size. Maybe
>>> some of your packets are lost in socket queues and this could improve
>>> the situation.
>>>
>>> I don't know if you have something roughly equivalent to "perf" on
>>> OpenBSD nowadays, as that could prove extremely useful to figure where
>>> the CPU time is spent. Other than that I'm a bit out of ideas.
>>>
>>> Willy
>>>
>>>

Reply via email to