https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124288

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
With

  vf2 = fltmax = __FLT_MAX__;// 2U * 0x7fffffff + 1 + 1.0f;
  //for (vf = 1.0f; (fltmax = vf2 - vf) == vf2; vf = vf * 2.0f)
  //  ;

the reduced testcase again passes with -march=x86-64-v3 but still fails with
-march=x86-64-v4 (with --param vect-epilogues-nomask=0
-mprefer-vector-width=128
you get the same single SSE width vectorization).

But the failure is at different point:

254 254 != 254 (254.250000)
255 255 != 255 (255.250000)
256 4294967295 != 0 (170141173319264429905852091742258462720.000000)
Aborted (core dumped)

correct(?):

255 255 != 255 (255.250000)
256 0 != 0 (170141173319264429905852091742258462720.000000)

that's the first

        f[i] = (fltmax + fltmin) / 2.0 - 1024 * 8 + 16.0f * i;

Reply via email to