Just wanted to share this in case it saves anyone else some headaches. We have some code that uses Crypto++ that had been working great for ages on everyone's Intel P4 box. We recently switched to AMD 64 dual core boxes. Everyone started encountering periodic exceptions:
InvertibleRSAFunction: computational error during private key operation when we were doing RSA decryption. This was very rare -- when we tried this in a loop, we'd hit the error maybe 1 in 3000 times. If we tried this on a thread other than the main thread, the error occurred much more often, maybe 1 in 20 times. The exception was coming from InvertibleRSAFunction::CalculateInverse in rsa.cpp. I won't paste it in here, but basically that method does a bunch of complicated math on a really big number (1024 bits in my usage), and in the error case the result fails a cross-check. Because the numbers are so large it's hard to run the numbers yourself and see what's failing. This also only happened in the release build of Crypto++; the debug build never exhibited this problem. Experimentation with the build switches revealed that turning off all optimizations in the release build was the only thing that inhibited this problem. Turning off any individual optimization made no difference. Ultimately, the problem seems to be the hand-tuned asm in PentiumOptimized::Add() and Subtract() in integer.cpp, which gets called in the course of doing the math in the function above. (Crypto++ was apparently selecting the P4Optimized implementation on our old Intel boxes and then dropping down to PentiumOptimized on the AMD boxes because it looks for "GenuineIntel" in the processor name.) I forced the Portable (C) implementation to get used instead and the problem went away entirely. I still don't know if the problem is specifically in the PentiumOptimized code path or also in P4Optimized; or exactly what was going on that caused these functions to (apparently) not always return the correct result when compiler optimizations were turned on. I don't particularly care. I measured the performance difference and across the operations I care about there is a 0-20% performance penalty for using the C implementation of the low-level math. I will gladly pay that penalty in exchange for the safety of the C implementation. Just wanted to post this in case it helps anyone who is googling for that exception text. Also, if anyone thinks I'm off base on what's going on here, I'm all ears... -Jeremy -- View this message in context: http://www.nabble.com/Beware-asm-code-paths-on-AMD-64-dual-core-t1137479.html#a2980248 Sent from the Crypto++ forum at Nabble.com.
