------- Comment #12 from developer at sandoe-acoustics dot co dot uk  
2009-12-08 23:40 -------
(In reply to comment #11)
> I think I understand why apple gcc42 does not show the problem: it does not
> call ___divdc3:
> 
> [macbook] f90/bug% diff -up pr42333_42.s pr42333_45.s
> --- pr42333_42.s        2009-12-08 23:00:29.000000000 +0100
> +++ pr42333_45.s        2009-12-08 23:00:07.000000000 +0100
> ...
> @@ -15,68 +15,61 @@ LCFI2:
>         movq    %rax, -16(%rbp)
>         movabsq $9214364837600034815, %rax
>         movq    %rax, -8(%rbp)
> -       movq    -16(%rbp), %rax
> -       movq    -8(%rbp), %rdx
> -       movq    %rax, -24(%rbp)
> -       movsd   -24(%rbp), %xmm1
> +       movq    -16(%rbp), %rdx
> +       movq    -8(%rbp), %rax
>         movq    %rdx, -24(%rbp)
>         movsd   -24(%rbp), %xmm0
> -       movapd  %xmm0, %xmm2
> -       addsd   %xmm1, %xmm2
> -       movapd  %xmm0, %xmm3
> -       subsd   %xmm1, %xmm3
> -       movsd   LC1(%rip), %xmm0
> -       movapd  %xmm2, %xmm1
> -       divsd   %xmm0, %xmm1
> -       movsd   LC1(%rip), %xmm0
> -       movapd  %xmm3, %xmm2
> -       divsd   %xmm0, %xmm2
> -       movapd  %xmm2, %xmm0
> -       movsd   %xmm1, -24(%rbp)
> -       movq    -24(%rbp), %rax
> +       movq    %rax, -24(%rbp)
> +       movsd   -24(%rbp), %xmm1
> +       movsd   LC2(%rip), %xmm3
> +       movsd   LC2(%rip), %xmm2
> +       call    ___divdc3
>         movsd   %xmm0, -24(%rbp)
>         movq    -24(%rbp), %rdx
> ...
> 
> This also explain why the test compiled with -c and 4.5, but linked with 4.2
> fails. So my guess about the lazy complex division seems right in libm. Could
> someone write a C code forcing the use of ___divdc3?

hmm.. indeed and, in fact, Apple's gcc-4.0 does not call ___divdc3 either (in
fact, in a quick go at manipulation of options I couldn't find a case forcing
either to call it).

As far as generation of a test case is concerned - why not just use the asm
generated by 4.5?

I'll crank up a mini with D10 tomorrow (if possible).. if the asm gives a fault
on D10 with 4.2 then that should be a file-able radar.

.. seems likely that there are two things here: 1. we seem to be generating
(probably) less efficient code than older versions of the compiler ... and 2.
possibly the ___divdc3 in /usr/lib/libSystem is faulty?

has anyone tried this on PPC?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42333

Reply via email to