Joachim Strömbergson <[email protected]> writes: > Niels Möller wrote: >> Benchmarking nettle's implementation on my office machine (core i5), >> >> algorithm cycles/byte >> arcfour 7.5 >> arcfour 3.75 (openssl) > > Side issue: Pretty big difference in performance also for arcfour.
Right, and this time in openssl's favour. I think that speed is quite impressive. I haven't written any arcfour assembly for x86_64, but I have tried earlied for x86 and sparc. It's a very serial loop doing one byte at a time. It's tempting to try to do two bytes at a time, but the easy way gives incorrect results when the i and j indices happen to collide. One approach I played a bit with was to nevertheless do two bytes at a time, and then add some unlikely condition to detect collisions and fix them. But I couldn't manage to make that fast. An easier trick is to generate 4 or eight bytes of the keystream at a time, collecting result in a register, so the xoring of the data can be done a word at a time. The sparc implementation does something along those lines, and at least does the data writes as aligned words. But, I'd rather spend time on making salsa20 (and/or chacha) fast, than optimizing arcfour. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ nettle-bugs mailing list [email protected] http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs
