Niels Möller <[email protected]> writes:

>> It could likely be speedup further by processing 2, 3 or 4 blocks in
>> parallel.
>
> I've given 2 blocks in parallel a try, but not quite working yet. My
> work-in-progress code below.

I've got it into working shape now, at least for little-endian. See
https://git.lysator.liu.se/nettle/nettle/-/blob/ppc-chacha-2core/powerpc64/p7/chacha-2core.asm

Next steps:

1. Fix it to work also for big-endian, 

2. Wire it up for fat builds.

3. Try out if 4-way gives additional speedup.

Benchmarking is appreciated. Compare the master branch to the
ppc-chacha-2core branch, configured with --enable-power-altivec, and run
./examples/nettle-benchmark chacha.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to