Merry xmas,

i lately had some use for -mrecip but it turned out to come with all
sorts of strings attached and apparently no opt-out. Briefly, barring
inline asm, i can't get gcc to emit those ops without a NR fixup.

# cat src/pr-recip.c
#include <xmmintrin.h>
typedef float v4sf_t __attribute__ ((__vector_size__ (16)));

__m128 foo(__m128 a) { return _mm_sqrt_ps(a); }
__m128 bar(__m128 a) { return _mm_rsqrt_ps(a); }
__m128 baz(__m128 a) { return _mm_rcp_ps(a); }

v4sf_t nope1(v4sf_t a) { return __builtin_ia32_sqrtps(a); }
v4sf_t nope2(v4sf_t a) { return __builtin_ia32_rsqrtps(a); }
v4sf_t allright(v4sf_t a) { return __builtin_ia32_rcpps(a); }

int main() { return 0; }
# /usr/local/gcc-4.3-20071221/bin/gcc -march=native -ffast-math
-mrecip -O2 src/pr-recip.c
... and as can be witnessed in the attached asm dump foo, bar, nope1,
nope2 get mangled (at least on x86-64 linux).

While i can somehow understand the logic behind the automatic
transformation of _mm_sqrt_ps - it can be argued that's what the user
has asked for - there's no obvious way to opt out. But then i really
don't understand why gcc feels the urge to tinker when i specifically
ask for a rsqrt.
To add insult to injury -mrecip, unlike fast-math, doesn't set any
macro so kludging around is a cat / mouse game.

Questions:
  a) is that really by design?
  b) what's the official way to dodge fixups when -mrecip is active?
  c) any chance for -mrecip to set __FAST_MATH_NONE_SHALL_PASS__ or something?

Attachment: dump.asm
Description: Binary data

Reply via email to