On Fri, 9 Jan 2026 at 02:27, Eric Biggers <[email protected]> wrote: > > On Thu, Jan 08, 2026 at 12:26:18PM -0800, Eric Biggers wrote: > > On Thu, Jan 08, 2026 at 12:32:00PM +0100, Ard Biesheuvel wrote: > > > On Mon, 5 Jan 2026 at 06:14, Eric Biggers <[email protected]> wrote: > > > > > > > > This series applies to libcrypto-next. It can also be retrieved from: > > > > > > > > git fetch > > > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git > > > > aes-lib-v1 > > > > > > > > This series makes three main improvements to the kernel's AES library: > > > > > > > > 1. Make it use the kernel's existing architecture-optimized AES code, > > > > including AES instructions, when available. Previously, only the > > > > traditional crypto API gave access to the optimized AES code. > > > > (As a reminder, AES instructions typically make AES over 10 times > > > > as fast as the generic code. They also make it constant-time.) > > > > > > > > 2. Support preparing an AES key for only the forward direction of the > > > > block cipher, using about half as much memory. This is a helpful > > > > optimization for many common AES modes of operation. It also helps > > > > keep structs small enough to be allocated on the stack, especially > > > > considering potential future library APIs for AES modes. > > > > > > > > 3. Replace the library's generic AES implementation with a much faster > > > > one that is almost as fast as "aes-generic", while still keeping > > > > the table size reasonably small and maintaining some constant-time > > > > hardening. This allows removing "aes-generic", unifying the > > > > current two generic AES implementations in the kernel tree. > > > > > > > > > > Architectures that support memory operands will be impacted by > > > dropping the pre-rotated lookup tables, especially if they have few > > > GPRs. > > > > > > I suspect that doesn't really matter in practice: if your pre-AESNI > > > IA-32 workload has a bottleneck on "aes-generic", you would have > > > probably moved it to a different machine by now. But the performance > > > delta will likely be noticeable so it is something that deserves a > > > mention. > > > > Sure. I only claimed that the new implementation is "almost as fast" as > > aes-generic, not "as fast". > > > > By the way, these are the results I get for crypto_cipher_encrypt_one() > > and crypto_cipher_decrypt_one() (averaged together) in a loop on an i386 > > kernel patched to not use AES-NI: > > > > aes-fixed-time: 77 MB/s > > aes-generic: 192 MB/s > > aes-lib: 185 MB/s > > > > I'm not sure how relevant these are, considering that this was collected > > on a modern CPU, not one of the (very) old ones that would actually be > > running i386 non-AESNI code. But if they are even vaguely > > representative, this suggests the new code does quite well: little > > slowdown over aes-generic, while adding some constant-time hardening > > (which arguably was an undeserved shortcut to not include before) and > > also using a lot less dcache. > > > > At the same time, there's clearly a large speedup vs. aes-fixed-time. > > So this will actually be a significant performance improvement on > > systems that were using aes-fixed-time. Many people may have been doing > > that unintentionally, due to it being set to a higher priority than > > aes-generic in the crypto_cipher API. > > > > I'll also note that the state of the art for parallelizable AES modes on > > CPUs without AES instructions is bit-slicing with vector registers. The > > kernel has such code for arm and arm64, but not for x86. If x86 without > > AES-NI was actually important, we should be adding that. But it seems > > clear that x86 CPUs have moved on, and hardly anyone cares anymore. If > > for now we can just provide something that's almost as fast as before > > (and maybe even a lot faster in some cases!), that seems fine. > > It's also worth emphasizing that there are likely to be systems that > support AES instructions but are not using them due to the corresponding > kconfig options (e.g. CONFIG_CRYPTO_AES_NI_INTEL) not being set to 'y'. > As we know, missing the crypto optimization kconfig options is a common > mistake. This series fixes that for single-block AES. > > So (in addition to the aes-fixed-time case) that's another case that > just gets faster, and where the difference between aes-generic and the > new generic code isn't actually relevant. >
Fair enough. Thanks for the elaboration.
