On Mon, 19 Jun 2023, André Günther via Gcc wrote:
I noticed that a simple function like
auto relu( float x ) {
return x > 0.f ? x : 0.f;
}
compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On
-O3 -mavx2 the former compiles above function to
relu(float):
vmaxss xmm0, xmm0, DWORD PTR .LC0[rip]
ret
.LC0:
.long 0
which is what I would naively expect and what also clang essentially does
(clang actually uses an xor before the maxss to get the zero). The latter,
however, compiles the function to
relu(float):
vxorps xmm1, xmm1, xmm1
vcmpltss xmm2, xmm1, xmm0
vblendvps xmm0, xmm1, xmm0, xmm2
ret
which looks like a missed optimisation. Does anyone know if there's a
reason for the changed behaviour?
With -fno-signed-zeros -ffinite-math-only, gcc-12 still uses max instead
of cmp+blend. So the first thing to check would be if both versions give
the same result on negative 0 and NaN.
--
Marc Glisse