Eric Richter <eric...@linux.ibm.com> writes:

> This patch introduces an optimized powerpc64 assembly implementation for
> sha512-compress, derived from the implementation for sha256-compress-n.

Thanks, I'm about to merge this. One question: When you store
non-volatile registers on the stack,

> +     li      T0, -8
> +     li      T1, -24
> +     stvx    v20, T0, SP
> +     stvx    v21, T1, SP

Why the offsets -8 and -24? My understanding is that on entry, SP is
16-byte aligned ("quad word"), and that the 8 bytes starting at the SP
value typically holds the caller's back chain pointer, so we shouldn't
be clobbering those bytes? (I'm looking at the stack frame figure on
page 34 in the v2.1.5 abi spec downloaded from
https://openpowerfoundation.org/specifications/64bitelfabi/).

I would have expected offsets -16 and -32. The file
powerpc64/p8/sha256-compress-n.asm, which I merged some month ago, also
uses -8 and -24. While, e.g., powerpc64/p8/ghash-update.asm, uses
offsets -16 and -32.

I also think the gpr register usage could be trimmed a bit. T0 and T1
are used only in function prologue and epilogue, and could overlap with
something else using volatile registers. And the two registers TC32,
TC48 could be replaced by a single register STATE32 = STATE + 32. But
that can be tweaked after merge.

Regards,
/Niels

-- 
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list -- nettle-bugs@lists.lysator.liu.se
To unsubscribe send an email to nettle-bugs-le...@lists.lysator.liu.se

Reply via email to