------- Comment #9 from svfuerst at gmail dot com 2010-05-10 23:27 ------- Remember that isnan() is a weird type-dependent macro. The special case I was testing is the __float128 version. __float128's are passed in sse registers, so using sse instructions to manipulate them can be a win. (No x87 involved.) Unfortunately, the sse instruction set isn't all that orthogonal, so using the normal 64bit registers can be faster in some cases. It also isn't obvious which sse-based algorithm is the best without testing. Hence all the benchmarking.
In this case, the resulting function is branchless, so it doesn't matter much which particular values you use for the input for timings. However, adding extra memory reads (like scanning an array for input like you describe), or writes (via storing the output to a volatile) does change the timings. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44053