On Wed, Nov 25, 2020 at 3:22 AM Niels Möller <[email protected]> wrote:
>
> Maamoun TK <[email protected]> writes:
>
> > On POWER9 I got the following benchmark result:
> >
> > ./configured:
> > chacha      encrypt  308.58
> > chacha      decrypt  325.87
> > ./configured --enable-power-altivec "master branch":
> > chacha      encrypt  342.15
> > chacha      decrypt  356.24
> > ./configured --enable-power-altivec "ppc-chacha-2core":
> > chacha      encrypt  648.97
> > chacha      decrypt  648.00
> >
> > It's gotten better with every further optimization on the core, great work.
>
> Nice. So almost a factor 2 speedup from doing 2 blocks in parallel. I
> wonder if one can get close to another factor of two by going to 4
> blocks. I hope to get the time to try that out, it should be fairly
> easy. (And if that does work out fine, maybe the code to do only 2 blocks
> could be removed).

Botan and Crypto++ uses 4x blocks. They usually hit about the same
benchmark numbers.

For Crypto++ on GCC112, mixed message sizes:

  * ChaCha20: 1200 MB/s, 2.9 cpb
  * ChaCha8: 2370 MB/s, 1.5 cpb

On an antique PowerMac G5:

  * ChaCha20: 400 MB/s, 4.9 cpb
  * ChaCha8: 725 MB/s, 2.6 cpb

Bernstein's results are at https://bench.cr.yp.to/results-stream.html.
He's showing 9 cpb on a 2006 IBM PowerPC. His implementation has a lot
of opportunities for improvement. Also see
https://cr.yp.to/streamciphers/timings/estreambench/submissions/salsa20/chacha8/ppc-altivec/chacha.c.

Jeff
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to