Eric Young wrote:
Quoting Brian Gladman <[EMAIL PROTECTED]>:

Ian Grigg wrote:

Jack Lloyd also passed along lots of good comments I'd
like to forward (having gained permission) FTR.  I've
edited them for brevity and pertinence.


>>I'm obviously being naive here ... I had thought that the combined mode would
>> be faster, as it would run through the data once only, and that AES seems to
>> clip along faster than SHA1.

AFAIK all of the modes that use only one block cipher invocation per block of
input are patented. EAX+CCM both use two AES operations per block, and
byte-for-byte SHA-1 is 2-5x faster than AES (at least in the implementations
I've seen/used/written), so using AES+HMAC is probably going to be faster than
AES/EAX or AES/CCM. The obvious exception being boxes with hardware AES chips
and slow CPUs (eg, an ARM7 with an AES coprocessor), where AES will of course
be much faster than SHA-1.

Maybe my C implementation of SHA1 is hopeless but I get SHA1 on an x86 at about 17 cycles per byte (over 100,000 bytes) and AES in C at 21 cycles per byte.

So I would put these two algorihms at about the same speed in C. In consequence I rather suspect that the 'two encryptions per block' cost might also apply to combined modes when AES is used with HMAC-SHA1.

Are you running on a P4? ASM for sha1 on a P4 takes about 11.9 cycles per byte. The P4 is a very touchy x86 implementation.
On most other architectures I nearly always see a bit less than 2 times faster
sha1 vs AES. On AMD64, asm, I have
AES-cbc at 12.2 cycles per byte and sha1 at 6.8. This is about
as good a CPU as it gets (IPC near 3 for both implementations).

The SHA1 figure is for a P3 using VC++ set to generate code that will run on all Pentium family machines. I have not optimised the C code for any particular machine. 17/12 for C/ASM is a bit worse than I would have hoped for but is not that bad.

I would not be surprised to see an average AES/SHA1 speed comparison in the 1.5:2.5 range but I was a bit surprised to see Jack's 2.0:5.0 range.

I will have to see if VC++ can be coaxed down from 17 cycles per byte for SHA1 without giving up on code that runs on all Pentium compatible machines :-)

   Brian Gladman

