https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124434

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Brian M. Sutin from comment #6)
> It's barfing up the pipeline on every loop iteration for only long double,
> and the -O1 optimizer knows how to fix the issue.

Yes.

For double we have at -O0:
```
.L3:
        movsd   -8(%rbp), %xmm0
        mulsd   -24(%rbp), %xmm0
        movsd   -32(%rbp), %xmm1
        addsd   %xmm1, %xmm0
        movsd   %xmm0, -8(%rbp)
        addl    $1, -12(%rbp)
.L2:
        cmpl    $999999999, -12(%rbp)
        jle     .L3
```

The sse loads have a load bypass so the load from `-8(%rbp)` will do a decent
job.

Reply via email to