On Mon, Jun 19, 2023 at 09:10:53PM +0200, André Günther via Gcc wrote: > I noticed that a simple function like > auto relu( float x ) { > return x > 0.f ? x : 0.f; > } > compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On > -O3 -mavx2 the former compiles above function to
Such reports should go into gcc.gnu.org/bugzilla/, not to the mailing list, if you are convinced that loading the constant from memory is faster. Another possibility is vxorps xmm1, xmm1, xmm1 vmaxss xmm0, xmm0, xmm1 ret which doesn't need to wait for the memory. This changed with https://gcc.gnu.org/r12-7693 > > relu(float): > vmaxss xmm0, xmm0, DWORD PTR .LC0[rip] > ret > .LC0: > .long 0 > > which is what I would naively expect and what also clang essentially does > (clang actually uses an xor before the maxss to get the zero). The latter, > however, compiles the function to > > relu(float): > vxorps xmm1, xmm1, xmm1 > vcmpltss xmm2, xmm1, xmm0 > vblendvps xmm0, xmm1, xmm0, xmm2 > ret > > which looks like a missed optimisation. Does anyone know if there's a > reason for the changed behaviour? Jakub