Further experimentation revealed that on our AMD boxes, IsP4() returns false
(fails "GenuineIntel" CPU ID check), but HasSSE2() returns true.  This was
leading SetPentiumFunctionPointers() to choose PentiumOptimized for Add and
Subtract, but P4Optimized for the Multiply functions.  I hadn't noticed that
that was going on when I made the previous post.

I tried forcing all combinations of P4Optimized/PentiumOptimized
implementations for add/subtract and multiply.  It looks like the culprit is
the P4Optimized multiply functions (not Add and Subtract as I previously
thought).  When I force the PentiumOptimized implementation of the multiply
functions, the problem never occurs; when I let it select the P4Optimized
multiply the problem readily shows up.

I don't know if it's wrong to try to use SSE2 on AMD 64, or if there's just
some issue in the asm implementation that shows up some times on AMD but not
Intel.  If it is correct to execute these on AMD 64, then I don't know if
just one of the 3 multiply functions is the culprit or if they all are not
working consistently as intended.  

It does look like P4Optimized::Multiply* are flaky on AMD 64 dual core.

static void SetPentiumFunctionPointers()
{
        if (IsP4())
        {
                s_pAdd = &P4Optimized::Add;
                s_pSub = &P4Optimized::Subtract;
        }
        else
        {
                s_pAdd = &PentiumOptimized::Add;
                s_pSub = &PentiumOptimized::Subtract;
        }

#ifdef SSE2_INTRINSICS_AVAILABLE
        if (HasSSE2())
        {
                s_pMul4 = &P4Optimized::Multiply4;
                s_pMul8 = &P4Optimized::Multiply8;
                s_pMul8B = &P4Optimized::Multiply8Bottom;
        }
        else
        {
                s_pMul4 = &PentiumOptimized::Multiply4;
                s_pMul8 = &PentiumOptimized::Multiply8;
                s_pMul8B = &PentiumOptimized::Multiply8Bottom;
        }
#endif
}

--
View this message in context: 
http://www.nabble.com/Beware-asm-code-paths-on-AMD-64-dual-core-t1137479.html#a3172559
Sent from the Crypto++ forum at Nabble.com.

Reply via email to