------- Comment #7 from svfuerst at gmail dot com 2010-05-10 22:44 ------- Perhaps an example usage helps:
The __float128 version of isnan() is rather slow. Trying different implmentations to see which is faster required some benchmarking. However, implementing the benchmark code requires an increasing number of work-arounds as gcc will rightly optimize everything away if given the chance to do so. This required using the result of that function/macro in some way, the simplest being to sum them, and print the result. The problem is that for the faster code, the overhead of the addition starts to perturb the results. In addition, an increasing number of function attributes are required to make sure the function wasn't cloned/inlined/elided as gcc version number increases. However, this isn't enough. It is conceivable that eventually gcc will be smart enough to completely understand the benchmarked function enough to replace the summation loop + printf with a single puts("result"); This is allowable since the internal state of the abstract machine is never used, only its output. For timing purposes, this is a disaster. (This doesn't happen for the isnan() currently, but does for other simpler functions.) In short, it would be really nice if there was a way to tell gcc that there is a hidden side effect of a function that is important: the total time taken due to calling it. Such an attribute may only be a combination of other attributes, but given the history of the compiler, the number of component attributes will increase with time, and is already an unwieldy number. Anyway, the result of much benchmarking shows that: #include <emmintrin.h> static __attribute__((noinline)) int fastisnan(__float128 x) { __m128i c1 = {0xffffffffffffffffull, 0x7fffffffffffffffull}; __m128i c2 = {0x7fff7fff7fff7fffull, 0x00017fff7fff7fffull}; __m128i x2 = *(__m128i *) &x; x2 &= c1; x2 = _mm_adds_epu16(c2, x2); return (_mm_movemask_epi8(x2) & 0xaaaa) > 0x8000; } is an order of magnitude faster than the current isnan() implementation for __float128 on x86_64. Similar improvements exist for isinf() and fpclassify() -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44053