On Thu, Jan 08, 2026 at 12:32:00PM +0100, Ard Biesheuvel wrote:
> On Mon, 5 Jan 2026 at 06:14, Eric Biggers <[email protected]> wrote:
> >
> > This series applies to libcrypto-next.  It can also be retrieved from:
> >
> >     git fetch 
> > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git 
> > aes-lib-v1
> >
> > This series makes three main improvements to the kernel's AES library:
> >
> >   1. Make it use the kernel's existing architecture-optimized AES code,
> >      including AES instructions, when available.  Previously, only the
> >      traditional crypto API gave access to the optimized AES code.
> >      (As a reminder, AES instructions typically make AES over 10 times
> >      as fast as the generic code.  They also make it constant-time.)
> >
> >   2. Support preparing an AES key for only the forward direction of the
> >      block cipher, using about half as much memory.  This is a helpful
> >      optimization for many common AES modes of operation.  It also helps
> >      keep structs small enough to be allocated on the stack, especially
> >      considering potential future library APIs for AES modes.
> >
> >   3. Replace the library's generic AES implementation with a much faster
> >      one that is almost as fast as "aes-generic", while still keeping
> >      the table size reasonably small and maintaining some constant-time
> >      hardening.  This allows removing "aes-generic", unifying the
> >      current two generic AES implementations in the kernel tree.
> >
> 
> Architectures that support memory operands will be impacted by
> dropping the pre-rotated lookup tables, especially if they have few
> GPRs.
> 
> I suspect that doesn't really matter in practice: if your pre-AESNI
> IA-32 workload has a bottleneck on "aes-generic", you would have
> probably moved it to a different machine by now. But the performance
> delta will likely be noticeable so it is something that deserves a
> mention.

Sure.  I only claimed that the new implementation is "almost as fast" as
aes-generic, not "as fast".

By the way, these are the results I get for crypto_cipher_encrypt_one()
and crypto_cipher_decrypt_one() (averaged together) in a loop on an i386
kernel patched to not use AES-NI:

    aes-fixed-time: 77 MB/s
    aes-generic: 192 MB/s
    aes-lib: 185 MB/s

I'm not sure how relevant these are, considering that this was collected
on a modern CPU, not one of the (very) old ones that would actually be
running i386 non-AESNI code.  But if they are even vaguely
representative, this suggests the new code does quite well: little
slowdown over aes-generic, while adding some constant-time hardening
(which arguably was an undeserved shortcut to not include before) and
also using a lot less dcache.

At the same time, there's clearly a large speedup vs. aes-fixed-time.
So this will actually be a significant performance improvement on
systems that were using aes-fixed-time.  Many people may have been doing
that unintentionally, due to it being set to a higher priority than
aes-generic in the crypto_cipher API.

I'll also note that the state of the art for parallelizable AES modes on
CPUs without AES instructions is bit-slicing with vector registers.  The
kernel has such code for arm and arm64, but not for x86.  If x86 without
AES-NI was actually important, we should be adding that.  But it seems
clear that x86 CPUs have moved on, and hardly anyone cares anymore.  If
for now we can just provide something that's almost as fast as before
(and maybe even a lot faster in some cases!), that seems fine.

- Eric

Reply via email to