https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
A testcase variant using __restrict was confirmed:

#include <x86intrin.h>

void bench_3(float * __restrict out, float * __restrict in, float f, unsigned
int n)
{
  n /= 8;
  __m128 scalar = _mm_set_ps1(f);
  do
    {
      __m128 arg = _mm_load_ps(in);
      __m128 result = _mm_add_ps(arg, scalar);
      _mm_store_ps(out, result);

      arg = _mm_load_ps(in+4);
      result = _mm_add_ps(arg, scalar);
      _mm_store_ps(out+4, result);
      in += 8;
      out += 8;
    }
  while (--n);
}

This is optimized with GCC 4.6 and up with -frename-registers or on trunk
where the latter is enabled by default now.

Fixed thus.

Reply via email to