Merry xmas, i lately had some use for -mrecip but it turned out to come with all sorts of strings attached and apparently no opt-out. Briefly, barring inline asm, i can't get gcc to emit those ops without a NR fixup.
# cat src/pr-recip.c #include <xmmintrin.h> typedef float v4sf_t __attribute__ ((__vector_size__ (16))); __m128 foo(__m128 a) { return _mm_sqrt_ps(a); } __m128 bar(__m128 a) { return _mm_rsqrt_ps(a); } __m128 baz(__m128 a) { return _mm_rcp_ps(a); } v4sf_t nope1(v4sf_t a) { return __builtin_ia32_sqrtps(a); } v4sf_t nope2(v4sf_t a) { return __builtin_ia32_rsqrtps(a); } v4sf_t allright(v4sf_t a) { return __builtin_ia32_rcpps(a); } int main() { return 0; } # /usr/local/gcc-4.3-20071221/bin/gcc -march=native -ffast-math -mrecip -O2 src/pr-recip.c ... and as can be witnessed in the attached asm dump foo, bar, nope1, nope2 get mangled (at least on x86-64 linux). While i can somehow understand the logic behind the automatic transformation of _mm_sqrt_ps - it can be argued that's what the user has asked for - there's no obvious way to opt out. But then i really don't understand why gcc feels the urge to tinker when i specifically ask for a rsqrt. To add insult to injury -mrecip, unlike fast-math, doesn't set any macro so kludging around is a cat / mouse game. Questions: a) is that really by design? b) what's the official way to dodge fixups when -mrecip is active? c) any chance for -mrecip to set __FAST_MATH_NONE_SHALL_PASS__ or something?
dump.asm
Description: Binary data