Hi, I tried out a microoptimization of the arm neon implementation of
chacha and salsa20. Gave a 10% speedup on the older Cortex-A5 core, but
unclear if it's an improvement overall, so I don't want to push it to
master, and I've removed that commit from master-updates (now on its own
branch arm-salsa20-chacha-vsra instead, in case anyone is curious). I'm
considering changing the internal _salsa20_core and _chacha_core to do
more than one block at a time, since processing a few blocks in parallel
has a great potential for performance improvements.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to