On 13 Mar 2016, at 21:10, Steve Kargl <[email protected]> wrote:
> On Sun, Mar 13, 2016 at 09:03:57PM +0100, Dimitry Andric wrote:
...
>> So it's storing the intermediate result in a double, for some reason.
>> The fnstsw will then result in zero, since there was no underflow at
>> that point.
>> 
>> I will submit a bug for this upstream, thanks for the report.

Submitted upstream as: https://llvm.org/bugs/show_bug.cgi?id=26931


> Thanks for the quick reply.  But, it must be using an 80-bit
> extended double instead of a double for storage.  This variation
> 
> #include <fenv.h>
> #include <stdio.h>
> 
> int
> main(void)
> {
>   int i;
> //   float x = 1.f;
>   double x = 1.;
>   i = 0;
>   feclearexcept(FE_ALL_EXCEPT);
>   do {
>      x /= 2;
>      i++;
>   } while(!fetestexcept(FE_UNDERFLOW));
>   if (fetestexcept(FE_UNDERFLOW)) printf("FE_UNDERFLOW: ");
>   printf("x = %e after %d iterations\n", x, i);
> 
>   return 0;
> }
> 
> yields
> 
> % cc -O -o z b.c -lm && ./z
> FE_UNDERFLOW: x = 0.000000e+00 after 16435 iterations
> 
> It should be 1075 iterations.
> 
> Note, there is a similar issue with OVERFLOW.  The upshot is
> that clang on current is probably miscompiling libm.

With this example, I also get different results from gcc (4.8.5),
depending on the optimization level:

$ gcc -O underflow-iter.c -o underflow-iter-gcc -lm
$ ./underflow-iter-gcc
FE_UNDERFLOW: x = 0.000000e+00 after 1075 iterations
$ gcc -O2 underflow-iter.c -o underflow-iter-gcc -lm
$ ./underflow-iter-gcc
FE_UNDERFLOW: x = 0.000000e+00 after 16435 iterations

Similar for the overflow case:

$ gcc -O overflow-iter.c -o overflow-iter-gcc -lm
$ ./overflow-iter-gcc
FE_OVERFLOW: x = inf after 1024 iterations
$ gcc -O2 overflow-iter.c -o overflow-iter-gcc -lm
$ ./overflow-iter-gcc
FE_OVERFLOW: x = inf after 16384 iterations

Are we depending on some sort of subtle undefined behavior here?  With
-O, the 'main loop' becomes:

.L3:
        fld1
        fstpl   24(%esp)
        movl    $0, %ebx
.L8:
        fldl    24(%esp)
        fld     %st(0)
        faddp   %st, %st(1)
        fstpl   24(%esp)
        addl    $1, %ebx
        fnstsw %ax
        movl    %eax, %esi
        movl    __has_sse, %eax
        testl   %eax, %eax
        je      .L4
        cmpl    $2, %eax
        jne     .L5
        call    __test_sse
        testl   %eax, %eax
        je      .L5
.L4:
        stmxcsr 44(%esp)
        jmp     .L6
.L5:
        movl    $0, 44(%esp)
.L6:
        orl     44(%esp), %esi
        testl   $8, %esi
        je      .L8

With -O2, it becomes:

.L3:
        fld1
        xorl    %ebx, %ebx
.L12:
        fadd    %st(0), %st
        addl    $1, %ebx
        fnstsw %ax
        testl   %edx, %edx
        movl    %eax, %esi
        je      .L10
        cmpl    $2, %edx
        je      .L27
.L9:
        xorl    %eax, %eax
.L8:
        orl     %eax, %esi
        andl    $8, %esi
        je      .L12

So it switches from using faddp and fstpl to direct fadd of %st(0) and
%st.  I assume that uses the internal 80 bit precision?  Gcc also
manages to move the __has_sse stuff out to further down in the function,
but it does not really affect the result.

-Dimitry

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to