Hi,
I noticed that a simple function like
auto relu( float x ) {
    return x > 0.f ? x : 0.f;
}
compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On
-O3 -mavx2 the former compiles above function to

relu(float):
    vmaxss xmm0, xmm0, DWORD PTR .LC0[rip]
    ret
.LC0:
    .long 0

which is what I would naively expect and what also clang essentially does
(clang actually uses an xor before the maxss to get the zero). The latter,
however, compiles the function to

relu(float):
    vxorps xmm1, xmm1, xmm1
    vcmpltss xmm2, xmm1, xmm0
    vblendvps xmm0, xmm1, xmm0, xmm2
    ret

which looks like a missed optimisation. Does anyone know if there's a
reason for the changed behaviour?

Andre

Reply via email to