On Mon, 19 Jun 2023, André Günther via Gcc wrote:

I noticed that a simple function like
auto relu( float x ) {
   return x > 0.f ? x : 0.f;
}
compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On
-O3 -mavx2 the former compiles above function to

relu(float):
   vmaxss xmm0, xmm0, DWORD PTR .LC0[rip]
   ret
.LC0:
   .long 0

which is what I would naively expect and what also clang essentially does
(clang actually uses an xor before the maxss to get the zero). The latter,
however, compiles the function to

relu(float):
   vxorps xmm1, xmm1, xmm1
   vcmpltss xmm2, xmm1, xmm0
   vblendvps xmm0, xmm1, xmm0, xmm2
   ret

which looks like a missed optimisation. Does anyone know if there's a
reason for the changed behaviour?

With -fno-signed-zeros -ffinite-math-only, gcc-12 still uses max instead of cmp+blend. So the first thing to check would be if both versions give the same result on negative 0 and NaN.

--
Marc Glisse

Reply via email to