Michael Weiser <[email protected]> writes: > The arm64 branch builds and passes the testsuite on aarch64 and > aarch64_be with gcc 10.2 and clang 11.0.1 with and without the optimized > assembly routines on my pine64 boards. This is with the .arch directive > instead of modifying CFLAGS and the new configure option name > --enable-arm64-crypto.
Thanks for testing! (My own testing was done with cross-compiler and user-level qemu). > Out of curiosity I've also collected some benchmark numbers for > gcm_aes256. (Is that a correct and sensible algorithm for that purpose?) I think that's appropriate for benchmarking gcm_hash, but the "update" numbers are the ones that reflect gcm_hash performance. > The speedup from using pmull seems to be around 35% for encrypt/decrypt. > > Interestingly, LE is about a cycle per block faster than BE even though > it should have quite a few more rev64s to execute than BE. Could this be > masked by memory accesses, pipelining or scheduling? For the encrypt/decrypt operations, you also run AES (in CTR mode), which works with little-endian data. > How is the massive speedup in update to be interpreted and that BE here > is indeed quite a bit faster than LE? Do I understand correctly that on > update only GCM is run on unencrypted data for authentication purposes > so that this number really indicates the pure GCM pmull speedup? That's right, the "update" numbers runs only the authentication part of gcm, i.e., gcm_hash. Which is useful for benchmarking gcm_hash, but probably not so relevant for real world applications, since I'd expect it's rare to pass large amounts of "associated data" to gcm. > What's also curious is that the system's openssl 1.1.1i is consistenly > reported an order of magnitude faster than nettle. I guess the major > factor is that there's no optimized AES for aarch64 yet in nettle which > openssl seems to have. That would be my guess too. And if we look at the update numbers only, the new code appears a bit faster than openssl. > Just out of curiosity: I assume there's no aesni-pmull-like GCM > implementation for x86_64? That's right. There's some assembly code, but using the same algorithm as the C implementation, based on table lookups. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ nettle-bugs mailing list [email protected] http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
