Re: [patch] make AES-cfb128-encrypt faster by uglifying it

Stephen Sprunk Fri, 26 May 2006 17:26:31 -0700

Thus spake "Andy Polyakov" <[EMAIL PROTECTED]>

Ok. How about now?
Subject to SIGBUS on most platforms. It's easy to carry away and score onx86 and render support for other platforms void, isn't it? I mean do mindunaligned access!

Ah, that may have been why I didn't "fix" that code to use u32. Morelikely, it was a happy accident that I inherited the portability of the codeI copied. I certainly introduced a few logic bugs of my own (which werequickly fixed by others)...

I'm curious if there's a significant performance difference between usingu32 and u64; the former should be portable to all supported platforms,and may make the latter unnecessary.
I'd recommend [or even insist] on for (i=0;i<16/sizeof(long);i++) loopsand let compiler unroll them. 4x4-byte chunks on 32-bit platforms and2x8-byte chunks - on 64-bit ones without a single shred of "#ifthat-or-that" spaghetti and no unnecessary dependency on totally unrelatedbn.h. And once again, unaligned input/output is to be treated byte bybyte.

My experience is that, for blocks as short as we're discussing here, thetests for unaligned blocks usually defeat the benefit you get in the alignedcase. Functions like memcpy() generally require a minimum size before theytry any such trickery due to the cost of the test, and 16 bytes is probablyon the edge for most platforms.

If you're using a platform that will transparently handle unaligned access(either in hardware or software), it's worth it, but IMHO not on code thathas to work on platforms that don't.

Plus, if we're going to go that route, we should consider that someplatforms have 128-bit XOR support in hardware; is it worth implementingthat too?
Is it really that widely used/important mode? To justify that much extracomplexity for little gain?

I hacked up a version of the AES code a while back that used SSE registersto pass the blocks around, do bitwise operations, etc. It was faster thanthe current version, but (IMHO) not enough to justify adding so muchunportable hackery to the project. If one desperately needs speed, theexisting approach is to use platform-specific asm, and that seemssufficient.

How much of this should be extended to other ciphers?  Should
xorN() and moveN() be part of the bignum code for reuse in other
modules?
I'd be opposed to this. If performance gets that important, function callwill hardly beat inline code anyway. Even if function is say 128-bit SSE2and inline is just 4x32-bit. A.

When I find such things useful, I tend to put them in a module's headers asa static inline function; that gets the speed of a macro with the semanticsand safety of a "real" function. Unfortunately, that approach probablywon't work on all of the platforms OpenSSL supports due to all the ancientcompilers floating around.


S

Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with

K5SSS smart people who disagree with them." --Aaron Sorkin

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           [EMAIL PROTECTED]

Re: [patch] make AES-cfb128-encrypt faster by uglifying it

Reply via email to