Hi Niels,
On Tue, 2022-01-04 at 20:54 +0100, Niels Möller wrote:
> [email protected] (Niels Möller) writes:
>
> > [email protected] (Niels Möller) writes:
> >
> > > I think it should be possible to reduce number of needed
> > > registers, and
> > > completely avoid using callee-save registers (load the values now
> > > in
> > > U4-U7 one at a time a bit closer to the place where they are
> > > needed in),
> > > and replace F3 with $1 in the FOLD and FOLDC macros.
> >
> > Attaching a variant to do this. Passes tests with qemu, but I
> > haven't
> > benchmarked it on any real hardware.
>
> Would you like to test and benchmark this on relevant real hardware,
> before I merged this version?
>
> Code still below, and committed to the branch ppc-secp256-tweaks.
Compared to the current version in master branch, this version
definitely improves the performance of the reduction code.
On POWER9, the reduction code shows 7% speed up when tested separately.
The improvement in P256 sign/verify is marginal. Here are the numbers
from hogweed-benchmark on POWER9.
name size sign/ms verify/ms
ecdsa 256 11.1013 3.5713 (master)
ecdsa 256 11.1527 3.6011 (this patch)
Amitay.
--
People on the net are always telling other people to "get a life." It
would be so much simlper if there were on available under GPL. "If you
use this life, you must tell other people where to get a life of their
own." - Christopher Davis
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs