Niels Möller <[email protected]> writes: >> It could likely be speedup further by processing 2, 3 or 4 blocks in >> parallel. > > I've given 2 blocks in parallel a try, but not quite working yet. My > work-in-progress code below.
I've got it into working shape now, at least for little-endian. See https://git.lysator.liu.se/nettle/nettle/-/blob/ppc-chacha-2core/powerpc64/p7/chacha-2core.asm Next steps: 1. Fix it to work also for big-endian, 2. Wire it up for fat builds. 3. Try out if 4-way gives additional speedup. Benchmarking is appreciated. Compare the master branch to the ppc-chacha-2core branch, configured with --enable-power-altivec, and run ./examples/nettle-benchmark chacha. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ nettle-bugs mailing list [email protected] http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
