On Wed, Apr 23, 2025 at 05:11:18PM +0200, Theo Buehler wrote: > On Wed, Apr 23, 2025 at 12:35:51PM +0200, Andreas Bartelt wrote: > > Hi, > > > > I've noticed that aes-128-gcm performance with scp(1) on amd64 based CPUs is > > much slower than expected on OpenBSD (i.e., I remember throughput being > > significantly better some time ago -- I think I saw much better throughput > > around the time when LRO and TSO were initially enabled for ix(4)). It looks > > to me like AES-NI isn't effectively used anymore. > > Right. Thanks for the report. The immediate reason for this is that ssh > relies on calls to OpenSSL_add_all_algorithms() to initialize libcrypto. > However, the call to OPENSSL_cpuid_setup() was removed from this function > (OPENSSL_add_all_algorithms_noconf()) in c_all.c r1.32 aka > > https://github.com/openbsd/src/commit/b2368ebdada0d6d022d20bbe96eab69dbc406e5a > > which means that the cpuid probe choosing an accelerated version if HW > support is available is no longer set up. This coincidentally happened > about a week after LRO was enabled by bluhm for all drivers in: > > https://github.com/openbsd/src/commit/3e1926f859efd008e94373bdb5bd5e8d9fb98874 > > Another bit that will hurt is that ssh switched from aes-128-ctr to > aes-128-gcm by default last December: > > https://github.com/openbsd/src/commit/08d45e79c0d607376dd5c42234e36d78473c3ae0 > > This doesn't make much of a difference in the unaccelerated case: > > Without AES-NI > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-128-gcm 174617.32k 211996.90k 693919.98k 754392.03k 775449.26k > aes-128-ctr 185805.70k 216658.12k 778577.33k 888563.84k 915544.45k > > but, since our GCM ASM is pretty bad, this will hurt in the accelerated > case. jsing will be looking into improving that since this is also > important for TLS. > > With AES-NI: > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes > aes-128-gcm 669421.74k 1886143.60k 3064423.66k 3495542.89k 3564934.49k > aes-128-ctr 990493.56k 3246635.81k 6959034.82k 9812672.93k 11506436.47k > > While we could (and probably should) add OPENSSL_init_crypto() calls to > the various *add_all* API, I think a better first fix will be this, > which means that the cpuid_setup happens whenever a cipher or a digest > is invoked via EVP and the accelerated implementation should be chosen > if available:
And here's the diff including the *add_all* API, which are also needed (otherwise the first call to cipher_init() will still end up using the unaccelerated implementation). Index: crypto_init.c =================================================================== RCS file: /cvs/src/lib/libcrypto/crypto_init.c,v diff -u -p -r1.22 crypto_init.c --- crypto_init.c 17 Oct 2024 14:27:57 -0000 1.22 +++ crypto_init.c 23 Apr 2025 12:27:08 -0000 @@ -99,18 +99,24 @@ LCRYPTO_ALIAS(OPENSSL_cleanup); void OpenSSL_add_all_ciphers(void) { + /* Prayer and clean living lets you ignore errors, OpenSSL style. */ + (void)OPENSSL_init_crypto(0, NULL); } LCRYPTO_ALIAS(OpenSSL_add_all_ciphers); void OpenSSL_add_all_digests(void) { + /* Prayer and clean living lets you ignore errors, OpenSSL style. */ + (void)OPENSSL_init_crypto(0, NULL); } LCRYPTO_ALIAS(OpenSSL_add_all_digests); void OPENSSL_add_all_algorithms_noconf(void) { + /* Prayer and clean living lets you ignore errors, OpenSSL style. */ + (void)OPENSSL_init_crypto(0, NULL); } LCRYPTO_ALIAS(OPENSSL_add_all_algorithms_noconf); Index: evp/evp_cipher.c =================================================================== RCS file: /cvs/src/lib/libcrypto/evp/evp_cipher.c,v diff -u -p -r1.23 evp_cipher.c --- evp/evp_cipher.c 10 Apr 2024 15:00:38 -0000 1.23 +++ evp/evp_cipher.c 23 Apr 2025 13:52:22 -0000 @@ -614,6 +614,9 @@ LCRYPTO_ALIAS(EVP_DecryptFinal_ex); EVP_CIPHER_CTX * EVP_CIPHER_CTX_new(void) { + if (!OPENSSL_init_crypto(0, NULL)) + return NULL; + return calloc(1, sizeof(EVP_CIPHER_CTX)); } LCRYPTO_ALIAS(EVP_CIPHER_CTX_new); Index: evp/evp_digest.c =================================================================== RCS file: /cvs/src/lib/libcrypto/evp/evp_digest.c,v diff -u -p -r1.14 evp_digest.c --- evp/evp_digest.c 10 Apr 2024 15:00:38 -0000 1.14 +++ evp/evp_digest.c 23 Apr 2025 13:14:36 -0000 @@ -226,6 +226,9 @@ LCRYPTO_ALIAS(EVP_Digest); EVP_MD_CTX * EVP_MD_CTX_new(void) { + if (!OPENSSL_init_crypto(0, NULL)) + return NULL; + return calloc(1, sizeof(EVP_MD_CTX)); } LCRYPTO_ALIAS(EVP_MD_CTX_new);