On Wed, Nov 25, 2020 at 3:22 AM Niels Möller <[email protected]> wrote: > > Maamoun TK <[email protected]> writes: > > > On POWER9 I got the following benchmark result: > > > > ./configured: > > chacha encrypt 308.58 > > chacha decrypt 325.87 > > ./configured --enable-power-altivec "master branch": > > chacha encrypt 342.15 > > chacha decrypt 356.24 > > ./configured --enable-power-altivec "ppc-chacha-2core": > > chacha encrypt 648.97 > > chacha decrypt 648.00 > > > > It's gotten better with every further optimization on the core, great work. > > Nice. So almost a factor 2 speedup from doing 2 blocks in parallel. I > wonder if one can get close to another factor of two by going to 4 > blocks. I hope to get the time to try that out, it should be fairly > easy. (And if that does work out fine, maybe the code to do only 2 blocks > could be removed).
Botan and Crypto++ uses 4x blocks. They usually hit about the same benchmark numbers. For Crypto++ on GCC112, mixed message sizes: * ChaCha20: 1200 MB/s, 2.9 cpb * ChaCha8: 2370 MB/s, 1.5 cpb On an antique PowerMac G5: * ChaCha20: 400 MB/s, 4.9 cpb * ChaCha8: 725 MB/s, 2.6 cpb Bernstein's results are at https://bench.cr.yp.to/results-stream.html. He's showing 9 cpb on a 2006 IBM PowerPC. His implementation has a lot of opportunities for improvement. Also see https://cr.yp.to/streamciphers/timings/estreambench/submissions/salsa20/chacha8/ppc-altivec/chacha.c. Jeff _______________________________________________ nettle-bugs mailing list [email protected] http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
