On Tue, Aug 10, 2021 at 11:55 PM Niels Möller <[email protected]> wrote:

> Maamoun TK <[email protected]> writes:
>
> > I made a merge request in the main repository that optimizes SHA1 for
> s390x
> > architecture with fat build support !33
> > <https://git.lysator.liu.se/nettle/nettle/-/merge_requests/33>.
>
> Regarding the discussion on
> https://git.lysator.liu.se/nettle/nettle/-/merge_requests/33#note_10005:
> It seems the sha1 instructions on s390x are fast enough that the
> overhead of loading constants, and loading and storing the state, all
> per block, is a significant cost.
>
> I think it makes sense to change the internal convention for
> _sha1_compress so that it can do multiple blocks. There are currently 5
> assembly implementations that would need updating: arm/v6, arm64/crypto,
> x86,
> x86_64 and x86_64/sha_ni. And the C implementation, of course.
>
> If it turns out to be too large a change to do them all at once, one
> could introduce some new _sha1_compress_n function or the like, and use
> when available. Actually, we probably need to do that anyway, since for
> historical reasons, _nettle_sha1_compress is a public function, and needs
> to be kept (as just a simple C wrapper) for backwards compatibility.
> Changing it incrementally should be doable but a bit hairy.
>
> There are some other similar compression functions with
> assembly implementation, for md5, sha256 and sha512. But there's no need
> to change them all at the same time, or at all.
>
> Regarding the MD_UPDATE macro, that one is defined in the public header
> file macros.h (which in retrospect was a mistake). So it's probably best
> to leave it unchanged. New macros for the new convention should be put
> into some internal header, e.g., md-internal.h.
>

I've initialized a support of sha1_compress_n function in this branch
https://git.lysator.liu.se/mamonet/nettle/-/tree/sha1-compress-n
The function works and performs as exprected, I also adapted sha1_compress
of s390x and arm64 with the new compress function.
Predictably, SHA1 update is now equally performing with the OpenSSL
function on arm64 architecture. Benchmark of executing
examples/nettle-benchmark on arm64:
         Algorithm         mode        Mbyte/s
         sha1               update       849.82
         openssl sha1  update       849.73
Benchmark of executing examples/nettle-benchmark on s390x:
        Algorithm         mode        Mbyte/s
         sha1               update       1791.25
The s390x performance of the new compress function now doubles the speed of
the single block optimized function using built-in SHA1 accelerator.
Yet, there are implementations of x86, x86_64, and arm architectures to
adapt with the new compress function, and the patch may have potential for
further improvements in terms of naming convention and documentation.

regards,
Mamone
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to