Beware asm code paths on AMD 64 dual core

Jeremy S (sent by Nabble.com) Thu, 16 Feb 2006 14:50:14 -0800

Just wanted to share this in case it saves anyone else some headaches.

We have some code that uses Crypto++ that had been working great for ages on
everyone's Intel P4 box.  We recently switched to AMD 64 dual core boxes. 
Everyone started encountering periodic exceptions:


InvertibleRSAFunction: computational error during private key operation

when we were doing RSA decryption.  This was very rare -- when we tried this
in a loop, we'd hit the error maybe 1 in 3000 times.  If we tried this on a
thread other than the main thread, the error occurred much more often, maybe
1 in 20 times.

The exception was coming from InvertibleRSAFunction::CalculateInverse in
rsa.cpp.  I won't paste it in here, but basically that method does a bunch
of complicated math on a really big number (1024 bits in my usage), and in
the error case the result fails a cross-check.  Because the numbers are so
large it's hard to run the numbers yourself and see what's failing.

This also only happened in the release build of Crypto++; the debug build
never exhibited this problem.  Experimentation with the build switches
revealed that turning off all optimizations in the release build was the
only thing that inhibited this problem.  Turning off any individual
optimization made no difference.

Ultimately, the problem seems to be the hand-tuned asm in
PentiumOptimized::Add() and Subtract() in integer.cpp, which gets called in
the course of doing the math in the function above.  (Crypto++ was
apparently selecting the P4Optimized implementation on our old Intel boxes
and then dropping down to PentiumOptimized on the AMD boxes because it looks
for "GenuineIntel" in the processor name.) 

I forced the Portable (C) implementation to get used instead and the problem
went away entirely.  I still don't know if the problem is specifically in
the PentiumOptimized code path or also in P4Optimized; or exactly what was
going on that caused these functions to (apparently) not always return the
correct result when compiler optimizations were turned on.  I don't
particularly care.  I measured the performance difference and across the
operations I care about there is a 0-20% performance penalty for using the C
implementation of the low-level math.  I will gladly pay that penalty in
exchange for the safety of the C implementation.

Just wanted to post this in case it helps anyone who is googling for that
exception text.

Also, if anyone thinks I'm off base on what's going on here, I'm all ears...
-Jeremy
--
View this message in context: 
http://www.nabble.com/Beware-asm-code-paths-on-AMD-64-dual-core-t1137479.html#a2980248
Sent from the Crypto++ forum at Nabble.com.

Beware asm code paths on AMD 64 dual core

Reply via email to