https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
A testcase variant using __restrict was confirmed:
#include <x86intrin.h>
void bench_3(float * __restrict out, float * __restrict in, float f, unsigned
int n)
{
n /= 8;
__m128 scalar = _mm_set_ps1(f);
do
{
__m128 arg = _mm_load_ps(in);
__m128 result = _mm_add_ps(arg, scalar);
_mm_store_ps(out, result);
arg = _mm_load_ps(in+4);
result = _mm_add_ps(arg, scalar);
_mm_store_ps(out+4, result);
in += 8;
out += 8;
}
while (--n);
}
This is optimized with GCC 4.6 and up with -frename-registers or on trunk
where the latter is enabled by default now.
Fixed thus.