On Mon, Jun 19, 2023 at 09:10:53PM +0200, André Günther via Gcc wrote:
> I noticed that a simple function like
> auto relu( float x ) {
>     return x > 0.f ? x : 0.f;
> }
> compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On
> -O3 -mavx2 the former compiles above function to

Such reports should go into gcc.gnu.org/bugzilla/, not to the mailing list,
if you are convinced that loading the constant from memory is faster.
Another possibility is
        vxorps xmm1, xmm1, xmm1
        vmaxss xmm0, xmm0, xmm1
        ret
which doesn't need to wait for the memory.
This changed with https://gcc.gnu.org/r12-7693

> 
> relu(float):
>     vmaxss xmm0, xmm0, DWORD PTR .LC0[rip]
>     ret
> .LC0:
>     .long 0
> 
> which is what I would naively expect and what also clang essentially does
> (clang actually uses an xor before the maxss to get the zero). The latter,
> however, compiles the function to
> 
> relu(float):
>     vxorps xmm1, xmm1, xmm1
>     vcmpltss xmm2, xmm1, xmm0
>     vblendvps xmm0, xmm1, xmm0, xmm2
>     ret
> 
> which looks like a missed optimisation. Does anyone know if there's a
> reason for the changed behaviour?

        Jakub

Reply via email to