------- Comment #7 from svfuerst at gmail dot com  2010-05-10 22:44 -------
Perhaps an example usage helps:

The __float128 version of isnan() is rather slow.  Trying different
implmentations to see which is faster required some benchmarking.  However,
implementing the benchmark code requires an increasing number of work-arounds
as gcc will rightly optimize everything away if given the chance to do so. 
This required using the result of that function/macro in some way, the simplest
being to sum them, and print the result.  The problem is that for the faster
code, the overhead of the addition starts to perturb the results.  In addition,
an increasing number of function attributes are required to make sure the
function wasn't cloned/inlined/elided as gcc version number increases.

However, this isn't enough.  It is conceivable that eventually gcc will be
smart enough to completely understand the benchmarked function enough to
replace the summation loop + printf with a single puts("result");  This is
allowable since the internal state of the abstract machine is never used, only
its output.  For timing purposes, this is a disaster.  (This doesn't happen for
the isnan() currently, but does for other simpler functions.)

In short, it would be really nice if there was a way to tell gcc that there is
a hidden side effect of a function that is important: the total time taken due
to calling it.  Such an attribute may only be a combination of other
attributes, but given the history of the compiler, the number of component
attributes will increase with time, and is already an unwieldy number.

Anyway, the result of much benchmarking shows that:
#include <emmintrin.h>
static __attribute__((noinline)) int fastisnan(__float128 x)
{
        __m128i c1 = {0xffffffffffffffffull, 0x7fffffffffffffffull};
        __m128i c2 = {0x7fff7fff7fff7fffull, 0x00017fff7fff7fffull};
        __m128i x2 = *(__m128i *) &x;

        x2 &= c1;
        x2 = _mm_adds_epu16(c2, x2);
        return (_mm_movemask_epi8(x2) & 0xaaaa) > 0x8000;
}
is an order of magnitude faster than the current isnan() implementation for
__float128 on x86_64.  Similar improvements exist for isinf() and fpclassify()


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44053

Reply via email to