On Tue, Aug 10, 2021 at 11:55 PM Niels Möller <[email protected]> wrote:
> Maamoun TK <[email protected]> writes: > > > I made a merge request in the main repository that optimizes SHA1 for > s390x > > architecture with fat build support !33 > > <https://git.lysator.liu.se/nettle/nettle/-/merge_requests/33>. > > Regarding the discussion on > https://git.lysator.liu.se/nettle/nettle/-/merge_requests/33#note_10005: > It seems the sha1 instructions on s390x are fast enough that the > overhead of loading constants, and loading and storing the state, all > per block, is a significant cost. > > I think it makes sense to change the internal convention for > _sha1_compress so that it can do multiple blocks. There are currently 5 > assembly implementations that would need updating: arm/v6, arm64/crypto, > x86, > x86_64 and x86_64/sha_ni. And the C implementation, of course. > > If it turns out to be too large a change to do them all at once, one > could introduce some new _sha1_compress_n function or the like, and use > when available. Actually, we probably need to do that anyway, since for > historical reasons, _nettle_sha1_compress is a public function, and needs > to be kept (as just a simple C wrapper) for backwards compatibility. > Changing it incrementally should be doable but a bit hairy. > > There are some other similar compression functions with > assembly implementation, for md5, sha256 and sha512. But there's no need > to change them all at the same time, or at all. > > Regarding the MD_UPDATE macro, that one is defined in the public header > file macros.h (which in retrospect was a mistake). So it's probably best > to leave it unchanged. New macros for the new convention should be put > into some internal header, e.g., md-internal.h. > I've initialized a support of sha1_compress_n function in this branch https://git.lysator.liu.se/mamonet/nettle/-/tree/sha1-compress-n The function works and performs as exprected, I also adapted sha1_compress of s390x and arm64 with the new compress function. Predictably, SHA1 update is now equally performing with the OpenSSL function on arm64 architecture. Benchmark of executing examples/nettle-benchmark on arm64: Algorithm mode Mbyte/s sha1 update 849.82 openssl sha1 update 849.73 Benchmark of executing examples/nettle-benchmark on s390x: Algorithm mode Mbyte/s sha1 update 1791.25 The s390x performance of the new compress function now doubles the speed of the single block optimized function using built-in SHA1 accelerator. Yet, there are implementations of x86, x86_64, and arm architectures to adapt with the new compress function, and the patch may have potential for further improvements in terms of naming convention and documentation. regards, Mamone _______________________________________________ nettle-bugs mailing list [email protected] http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
