Joachim Strömbergson <[email protected]> writes:

> Niels Möller wrote:
>> Benchmarking nettle's implementation on my office machine (core i5),
>>
>> algorithm    cycles/byte
>> arcfour              7.5
>> arcfour              3.75 (openssl)
>
> Side issue: Pretty big difference in performance also for arcfour.

Right, and this time in openssl's favour. I think that speed is quite
impressive. I haven't written any arcfour assembly for x86_64, but I
have tried earlied for x86 and sparc. It's a very serial loop doing one
byte at a time. It's tempting to try to do two bytes at a time, but the
easy way gives incorrect results when the i and j indices happen to
collide.

One approach I played a bit with was to nevertheless do two bytes at a
time, and then add some unlikely condition to detect collisions and fix
them. But I couldn't manage to make that fast.

An easier trick is to generate 4 or eight bytes of the keystream at a
time, collecting result in a register, so the xoring of the data can be
done a word at a time. The sparc implementation does something along
those lines, and at least does the data writes as aligned words.

But, I'd rather spend time on making salsa20 (and/or chacha) fast,
than optimizing arcfour.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to