On Jul 29, 2010, at 15:39, Wei Dai wrote:
> Forwarding this to the list, in case someone else is curious why Crypto++ 
> tends to avoid intrinsics in favor of inline assembly.

Thanks; I appreciate seeing what kind of thought was behind this.

> 
> --------------------------------------------------
> From: "Wei Dai"
> 
>> A couple of reasons for not using intrinsics:
>> 
>> 1. Many compilers tend to be buggy when compiling complex (or sometimes even 
>> simple) intrinsics code. I suspect they don't test much on intrinsics. I 
>> often write the first version using intrinsics and then are forced to 
>> re-write in assembly because I can't work around the compiler bugs.

It may not be of interest to most people, but I'd be interested in seeing the 
version you start with included, even if it's not used, for a few reasons:

* Test case for bug reports to file against the compiler(s).  If the bugs 
aren't known, it's harder to fix them.

* Once the compiler bugs are fixed, as compiler optimizations continue to 
improve, performance when compiling the code with intrinsics is somewhat likely 
to improve, while an asm version frozen in time isn't likely to improve 
significantly.  (Maybe minor tweaks to the asm code, but probably no massive 
reordering of instructions, restructuring of local data organization, etc.  
Maybe I'm wrong, and you do invest that level of effort into tuning assembly 
code, but I think most people do not.)

* If some compiler gets fixed, and starts generating faster code than your asm 
code, and other compilers are still broken, you can drop the old asm code and 
start shipping the asm code generated by the working compiler.

* If the asm code performs better, it may suggest possible optimization 
opportunities for compiler hackers.

* The intrinsic version may be a better starting point for someone trying to 
micro-optimize an asm version for a different architecture with similar SIMD 
capabilities.

* Alternatively, someone might experiment with wrapping intrinsics in functions 
or classes which conditionally expand to either use intrinsics or open-code 
versions of the same functionality.

>> 2. GCC doesn't allow SSE2 intrinsics for example unless  you specify -msse2 
>> or -march= where the microarchitecture supports SSE2, but then it might 
>> generate SSE2 code even for non-intrinsic code that do not go through CPU 
>> feature detection tests, which would cause a SIGILL on non-SSE2 CPUs. Last 
>> night I ended up writing my own versions of some intrinsic functions using 
>> inline assembly to work around this. See 
>> http://www.kaourantin.net/2006/09/gcc-challenges.html for someone's blog 
>> post complaining about this.

I'll point out that in some environments it's common or at least possible to 
compile different versions of a library based on specific architecture 
variants, to be selected at program load time, so the library doesn't have to 
do run-time determination itself, or just compile the program/library to be run 
on the local machine only (or only on machines meeting certain minimum platform 
specs that include SSE2, and maybe even SSSE3), and thus portability to another 
machine of similar architecture but not matching capabilities isn't important.

There have also been some interesting developments in GCC relating to compiling 
different functions for different architectures, but I haven't followed the 
details.

Of course, conditionalizing both C++ and asm code to support these different 
modes in one source base, with optimum efficiency (test a flag and 
conditionally branch, when we know the flag is always zero on this machine??), 
would be messy, and I don't actually expect you do go out of your way to do it. 
 But it occurs to me that if someone found a good speedup to the inner loop of 
SHA1, say, that depended on SSE2 or SSSE3 or SSE4.1 instructions, adding a test 
and conditional branch there might destroy the performance gain, and compiling 
separate versions of the function for each mode might turn out to be a better 
approach in that case.  (No, I don't have such a version.  Just thinking 
somebody might, someday....)

Ken

-- 
You received this message because you are subscribed to the "Crypto++ Users" 
Google Group.
To unsubscribe, send an email to [email protected].
More information about Crypto++ and this group is available at 
http://www.cryptopp.com.

Reply via email to