------- Comment #9 from svfuerst at gmail dot com  2010-05-10 23:27 -------
Remember that isnan() is a weird type-dependent macro.  The special case I was
testing is the __float128 version.  __float128's are passed in sse registers,
so using sse instructions to manipulate them can be a win.  (No x87 involved.) 
Unfortunately, the sse instruction set isn't all that orthogonal, so using the
normal 64bit registers can be faster in some cases.  It also isn't obvious
which sse-based algorithm is the best without testing.  Hence all the
benchmarking.

In this case, the resulting function is branchless, so it doesn't matter much
which particular values you use for the input for timings.  However, adding
extra memory reads (like scanning an array for input like you describe), or
writes (via storing the output to a volatile) does change the timings.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44053

Reply via email to