https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91722
Bug ID: 91722
Summary: gcc generates sub-optimal assembly when AVX
instructions are used.
Product: gcc
Version: 9.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: maxim.yegorushkin at gmail dot com
Target Milestone: ---
The following code:
#include <immintrin.h>
__m256 copysign_ps(__m256 from, __m256 to) {
constexpr float signbit = -0.f;
auto const avx_sigbit = _mm256_broadcast_ss(&signbit);
return _mm256_or_ps(_mm256_and_ps(avx_sigbit, from),
_mm256_andnot_ps(avx_sigbit, to));
}
When compiled with `g++-9.2 -O2 -mavx -std=c++11` produces the following
assembly:
copysign_ps(float __vector(8), float __vector(8)):
push rbp
vmovaps ymm2, ymm0
mov rbp, rsp
and rsp, -32
vbroadcastss ymm0, DWORD PTR .LC0[rip]
vandnps ymm1, ymm0, ymm1
vandps ymm0, ymm0, ymm2
vorps ymm0, ymm0, ymm1
leave
ret
.LC0:
.long 2147483648
The 4 instructions involving rbp, rsp and leave do not seem to be necessary at
all.
When compiled with `clang++-8.0 -O2 -mavx -std=c++11` it produces assembly with
only expected instructions:
.LCPI0_0:
.long 2147483648 # 0x80000000
.long 2147483648 # 0x80000000
.long 2147483648 # 0x80000000
.long 2147483648 # 0x80000000
.long 2147483648 # 0x80000000
.long 2147483648 # 0x80000000
.long 2147483648 # 0x80000000
.long 2147483648 # 0x80000000
.LCPI0_1:
.long 2147483647 # 0x7fffffff
.long 2147483647 # 0x7fffffff
.long 2147483647 # 0x7fffffff
.long 2147483647 # 0x7fffffff
.long 2147483647 # 0x7fffffff
.long 2147483647 # 0x7fffffff
.long 2147483647 # 0x7fffffff
.long 2147483647 # 0x7fffffff
copysign_ps(float __vector(8), float __vector(8)): #
@copysign_ps(float __vector(8), float __vector(8))
vandps ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
vandps ymm1, ymm1, ymmword ptr [rip + .LCPI0_1]
vorps ymm0, ymm1, ymm0
ret