https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124288

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Works with -fexcess-precision=standard or -mfpmath=sse.

4294967295 != 0 (4294967296.000000)
Aborted (core dumped)

the only loop we vectorize is

__attribute__ ((noinline, noclone)) void
f2ui (void)
{
  int i;
  for (i = 0; i < 1024; i++)
    ui[i] = f[i];
}

which uses

.L2:
        vcvttps2udq     f(%eax), %xmm0
        addl    $16, %eax
        vmovdqa %xmm0, ui-16(%eax)
        cmpl    $4096, %eax
        jne     .L2

with -v4 and

f2ui:
.LFB22:
        .cfi_startproc
        vbroadcastss    .LC1, %xmm2
        xorl    %eax, %eax
        .p2align 6
        .p2align 4
        .p2align 3
.L2:
        vmovaps f(%eax), %xmm0
        addl    $16, %eax
        vcmpleps        %xmm0, %xmm2, %xmm1
        vandps  %xmm2, %xmm1, %xmm3
        vpslld  $31, %xmm1, %xmm1
        vsubps  %xmm3, %xmm0, %xmm0
        vcvttps2dq      %xmm0, %xmm0
        vpxor   %xmm1, %xmm0, %xmm0
        vmovdqa %xmm0, ui-16(%eax)
        cmpl    $4096, %eax
        jne     .L2

with -v3 (which works without -mfpmath=sse).

Instrumenting with

  f2ui ();
  for (i = 0; i < 1024; i++)
    {
      fprintf (stderr, "%i %u != %u (%f)\n", i, ui[i], (unsigned int)f[i],
f[i]);     
      if (ui[i] != (__typeof (ui[0]))f[i])
        abort ();
    }

With -v4:

892 4294967040 != 4294967040 (4294967040.000000)
893 4294967040 != 4294967040 (4294967040.000000)
894 4294967040 != 4294967040 (4294967040.000000)
895 4294967295 != 0 (4294967296.000000)
Aborted (core dumped)

with -v3:

892 4294967040 != 4294967040 (4294967040.000000)
893 4294967040 != 4294967040 (4294967040.000000)
894 4294967040 != 4294967040 (4294967040.000000)
895 0 != 0 (4294967296.000000)
896 0 != 0 (4294967296.000000)
...

it seems the code computes fltmax in odd ways and we run into saturation,
possibly getting undefined float to unsigned int converts (overflow)?
Possibly the test should use __FLT_MAX__ and friends instead of that
weird computation.

Jakub, you wrote the testcase.

Reply via email to