Hi, I tried out a microoptimization of the arm neon implementation of chacha and salsa20. Gave a 10% speedup on the older Cortex-A5 core, but unclear if it's an improvement overall, so I don't want to push it to master, and I've removed that commit from master-updates (now on its own branch arm-salsa20-chacha-vsra instead, in case anyone is curious). I'm considering changing the internal _salsa20_core and _chacha_core to do more than one block at a time, since processing a few blocks in parallel has a great potential for performance improvements.
Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ nettle-bugs mailing list [email protected] http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
