On Jul 29, 2010, at 15:39, Wei Dai wrote: > Forwarding this to the list, in case someone else is curious why Crypto++ > tends to avoid intrinsics in favor of inline assembly.
Thanks; I appreciate seeing what kind of thought was behind this. > > -------------------------------------------------- > From: "Wei Dai" > >> A couple of reasons for not using intrinsics: >> >> 1. Many compilers tend to be buggy when compiling complex (or sometimes even >> simple) intrinsics code. I suspect they don't test much on intrinsics. I >> often write the first version using intrinsics and then are forced to >> re-write in assembly because I can't work around the compiler bugs. It may not be of interest to most people, but I'd be interested in seeing the version you start with included, even if it's not used, for a few reasons: * Test case for bug reports to file against the compiler(s). If the bugs aren't known, it's harder to fix them. * Once the compiler bugs are fixed, as compiler optimizations continue to improve, performance when compiling the code with intrinsics is somewhat likely to improve, while an asm version frozen in time isn't likely to improve significantly. (Maybe minor tweaks to the asm code, but probably no massive reordering of instructions, restructuring of local data organization, etc. Maybe I'm wrong, and you do invest that level of effort into tuning assembly code, but I think most people do not.) * If some compiler gets fixed, and starts generating faster code than your asm code, and other compilers are still broken, you can drop the old asm code and start shipping the asm code generated by the working compiler. * If the asm code performs better, it may suggest possible optimization opportunities for compiler hackers. * The intrinsic version may be a better starting point for someone trying to micro-optimize an asm version for a different architecture with similar SIMD capabilities. * Alternatively, someone might experiment with wrapping intrinsics in functions or classes which conditionally expand to either use intrinsics or open-code versions of the same functionality. >> 2. GCC doesn't allow SSE2 intrinsics for example unless you specify -msse2 >> or -march= where the microarchitecture supports SSE2, but then it might >> generate SSE2 code even for non-intrinsic code that do not go through CPU >> feature detection tests, which would cause a SIGILL on non-SSE2 CPUs. Last >> night I ended up writing my own versions of some intrinsic functions using >> inline assembly to work around this. See >> http://www.kaourantin.net/2006/09/gcc-challenges.html for someone's blog >> post complaining about this. I'll point out that in some environments it's common or at least possible to compile different versions of a library based on specific architecture variants, to be selected at program load time, so the library doesn't have to do run-time determination itself, or just compile the program/library to be run on the local machine only (or only on machines meeting certain minimum platform specs that include SSE2, and maybe even SSSE3), and thus portability to another machine of similar architecture but not matching capabilities isn't important. There have also been some interesting developments in GCC relating to compiling different functions for different architectures, but I haven't followed the details. Of course, conditionalizing both C++ and asm code to support these different modes in one source base, with optimum efficiency (test a flag and conditionally branch, when we know the flag is always zero on this machine??), would be messy, and I don't actually expect you do go out of your way to do it. But it occurs to me that if someone found a good speedup to the inner loop of SHA1, say, that depended on SSE2 or SSSE3 or SSE4.1 instructions, adding a test and conditional branch there might destroy the performance gain, and compiling separate versions of the function for each mode might turn out to be a better approach in that case. (No, I don't have such a version. Just thinking somebody might, someday....) Ken -- You received this message because you are subscribed to the "Crypto++ Users" Google Group. To unsubscribe, send an email to [email protected]. More information about Crypto++ and this group is available at http://www.cryptopp.com.
