https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963
Bug ID: 88963
Summary: gcc generates terrible code for vectors of 64+ length
which are not natively supported
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: [email protected]
Target Milestone: ---
[code]
typedef int VInt __attribute__((vector_size(64)));
void test(VInt*__restrict a, VInt*__restrict b,
VInt*__restrict c)
{
*a = *b + *c;
}
[/code]
This code compiled with -O3 -march=skylake in following way:
[asm]
test(int __vector(16)*, int __vector(16)*, int __vector(16)*):
push rbp
mov rbp, rsp
and rsp, -64
sub rsp, 136
vmovdqa xmm3, XMMWORD PTR [rsi]
vmovdqa xmm4, XMMWORD PTR [rsi+16]
vmovdqa xmm5, XMMWORD PTR [rsi+32]
vmovdqa xmm6, XMMWORD PTR [rsi+48]
vmovdqa xmm7, XMMWORD PTR [rdx]
vmovaps XMMWORD PTR [rsp-56], xmm3
vmovdqa xmm1, XMMWORD PTR [rdx+16]
vmovaps XMMWORD PTR [rsp-40], xmm4
vmovdqa ymm4, YMMWORD PTR [rsp-56]
vmovdqa xmm2, XMMWORD PTR [rdx+32]
vmovaps XMMWORD PTR [rsp-8], xmm6
vmovaps XMMWORD PTR [rsp+8], xmm7
vmovdqa xmm3, XMMWORD PTR [rdx+48]
vmovaps XMMWORD PTR [rsp-24], xmm5
vmovaps XMMWORD PTR [rsp+24], xmm1
vpaddd ymm0, ymm4, YMMWORD PTR [rsp+8]
vmovdqa ymm5, YMMWORD PTR [rsp-24]
vmovaps XMMWORD PTR [rsp+40], xmm2
vmovaps XMMWORD PTR [rsp+56], xmm3
vmovdqa xmm2, xmm0
vmovdqa YMMWORD PTR [rsp-120], ymm0
vpaddd ymm0, ymm5, YMMWORD PTR [rsp+40]
vmovdqa xmm6, XMMWORD PTR [rsp-104]
vmovdqa YMMWORD PTR [rsp-88], ymm0
vmovdqa xmm7, XMMWORD PTR [rsp-72]
vmovaps XMMWORD PTR [rdi], xmm2
vmovaps XMMWORD PTR [rdi+16], xmm6
vmovaps XMMWORD PTR [rdi+32], xmm0
vmovaps XMMWORD PTR [rdi+48], xmm7
vzeroupper
leave
ret
[/asm]
Other compilers (clang, icc) produces nice code. This is from clang:
[asm]
test(int __vector(16)*, int __vector(16)*, int __vector(16)*): # @test(int
__vector(16)*, int __vector(16)*, int __vector(16)*)
vmovdqa ymm0, ymmword ptr [rdx]
vmovdqa ymm1, ymmword ptr [rdx + 32]
vpaddd ymm0, ymm0, ymmword ptr [rsi]
vpaddd ymm1, ymm1, ymmword ptr [rsi + 32]
vmovdqa ymmword ptr [rdi + 32], ymm1
vmovdqa ymmword ptr [rdi], ymm0
vzeroupper
ret
[/asm]
gcc produces pretty code for -O3 -march=skylake-avx512. Pretty code is also for
vector size 32 with AVX disabled. However for vector size 128 and -O3
-march=skylake-avx512 code is again ugly.