Michael Weiser <[email protected]> writes:

> The arm64 branch builds and passes the testsuite on aarch64 and
> aarch64_be with gcc 10.2 and clang 11.0.1 with and without the optimized
> assembly routines on my pine64 boards. This is with the .arch directive
> instead of modifying CFLAGS and the new configure option name
> --enable-arm64-crypto.

Thanks for testing! (My own testing was done with cross-compiler and
user-level qemu).

> Out of curiosity I've also collected some benchmark numbers for
> gcm_aes256. (Is that a correct and sensible algorithm for that purpose?)

I think that's appropriate for benchmarking gcm_hash, but the "update"
numbers are the ones that reflect gcm_hash performance.

> The speedup from using pmull seems to be around 35% for encrypt/decrypt.
>
> Interestingly, LE is about a cycle per block faster than BE even though
> it should have quite a few more rev64s to execute than BE. Could this be
> masked by memory accesses, pipelining or scheduling?

For the encrypt/decrypt operations, you also run AES (in CTR mode),
which works with little-endian data.

> How is the massive speedup in update to be interpreted and that BE here
> is indeed quite a bit faster than LE? Do I understand correctly that on
> update only GCM is run on unencrypted data for authentication purposes
> so that this number really indicates the pure GCM pmull speedup?

That's right, the "update" numbers runs only the authentication part of
gcm, i.e., gcm_hash. Which is useful for benchmarking gcm_hash, but
probably not so relevant for real world applications, since I'd expect
it's rare to pass large amounts of "associated data" to gcm.

> What's also curious is that the system's openssl 1.1.1i is consistenly
> reported an order of magnitude faster than nettle. I guess the major
> factor is that there's no optimized AES for aarch64 yet in nettle which
> openssl seems to have.

That would be my guess too. And if we look at the update numbers only,
the new code appears a bit faster than openssl.

> Just out of curiosity: I assume there's no aesni-pmull-like GCM
> implementation for x86_64?

That's right. There's some assembly code, but using the same algorithm
as the C implementation, based on table lookups.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to