On Fri, 9 Jan 2026 at 02:27, Eric Biggers <[email protected]> wrote:
>
> On Thu, Jan 08, 2026 at 12:26:18PM -0800, Eric Biggers wrote:
> > On Thu, Jan 08, 2026 at 12:32:00PM +0100, Ard Biesheuvel wrote:
> > > On Mon, 5 Jan 2026 at 06:14, Eric Biggers <[email protected]> wrote:
> > > >
> > > > This series applies to libcrypto-next.  It can also be retrieved from:
> > > >
> > > >     git fetch 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux.git 
> > > > aes-lib-v1
> > > >
> > > > This series makes three main improvements to the kernel's AES library:
> > > >
> > > >   1. Make it use the kernel's existing architecture-optimized AES code,
> > > >      including AES instructions, when available.  Previously, only the
> > > >      traditional crypto API gave access to the optimized AES code.
> > > >      (As a reminder, AES instructions typically make AES over 10 times
> > > >      as fast as the generic code.  They also make it constant-time.)
> > > >
> > > >   2. Support preparing an AES key for only the forward direction of the
> > > >      block cipher, using about half as much memory.  This is a helpful
> > > >      optimization for many common AES modes of operation.  It also helps
> > > >      keep structs small enough to be allocated on the stack, especially
> > > >      considering potential future library APIs for AES modes.
> > > >
> > > >   3. Replace the library's generic AES implementation with a much faster
> > > >      one that is almost as fast as "aes-generic", while still keeping
> > > >      the table size reasonably small and maintaining some constant-time
> > > >      hardening.  This allows removing "aes-generic", unifying the
> > > >      current two generic AES implementations in the kernel tree.
> > > >
> > >
> > > Architectures that support memory operands will be impacted by
> > > dropping the pre-rotated lookup tables, especially if they have few
> > > GPRs.
> > >
> > > I suspect that doesn't really matter in practice: if your pre-AESNI
> > > IA-32 workload has a bottleneck on "aes-generic", you would have
> > > probably moved it to a different machine by now. But the performance
> > > delta will likely be noticeable so it is something that deserves a
> > > mention.
> >
> > Sure.  I only claimed that the new implementation is "almost as fast" as
> > aes-generic, not "as fast".
> >
> > By the way, these are the results I get for crypto_cipher_encrypt_one()
> > and crypto_cipher_decrypt_one() (averaged together) in a loop on an i386
> > kernel patched to not use AES-NI:
> >
> >     aes-fixed-time: 77 MB/s
> >     aes-generic: 192 MB/s
> >     aes-lib: 185 MB/s
> >
> > I'm not sure how relevant these are, considering that this was collected
> > on a modern CPU, not one of the (very) old ones that would actually be
> > running i386 non-AESNI code.  But if they are even vaguely
> > representative, this suggests the new code does quite well: little
> > slowdown over aes-generic, while adding some constant-time hardening
> > (which arguably was an undeserved shortcut to not include before) and
> > also using a lot less dcache.
> >
> > At the same time, there's clearly a large speedup vs. aes-fixed-time.
> > So this will actually be a significant performance improvement on
> > systems that were using aes-fixed-time.  Many people may have been doing
> > that unintentionally, due to it being set to a higher priority than
> > aes-generic in the crypto_cipher API.
> >
> > I'll also note that the state of the art for parallelizable AES modes on
> > CPUs without AES instructions is bit-slicing with vector registers.  The
> > kernel has such code for arm and arm64, but not for x86.  If x86 without
> > AES-NI was actually important, we should be adding that.  But it seems
> > clear that x86 CPUs have moved on, and hardly anyone cares anymore.  If
> > for now we can just provide something that's almost as fast as before
> > (and maybe even a lot faster in some cases!), that seems fine.
>
> It's also worth emphasizing that there are likely to be systems that
> support AES instructions but are not using them due to the corresponding
> kconfig options (e.g. CONFIG_CRYPTO_AES_NI_INTEL) not being set to 'y'.
> As we know, missing the crypto optimization kconfig options is a common
> mistake.  This series fixes that for single-block AES.
>
> So (in addition to the aes-fixed-time case) that's another case that
> just gets faster, and where the difference between aes-generic and the
> new generic code isn't actually relevant.
>

Fair enough. Thanks for the elaboration.

Reply via email to