https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110485
Bug ID: 110485
Summary: vectorizing simd clone calls without loop masking
applied
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rguenth at gcc dot gnu.org
Target Milestone: ---
#include <math.h>
double a[1024];
double b[1024];
void foo (int n)
{
for (int i = 0; i < n; ++i)
a[i] = pow (b[i], 71.2);
}
with -Ofast -march=znver4 --param vect-partial-vector-usage=1 gets us
the following OK main loop
.L4:
vmovapd b(%rbx), %zmm0
vmovapd -112(%rbp), %zmm1
addq $64, %rbx
call _ZGVeN8vv_pow
vmovapd %zmm0, a-64(%rbx)
cmpq %r13, %rbx
jne .L4
but the following vectorized masked epilogue:
movl %r12d, %eax
andl $-8, %eax
testb $7, %r12b
je .L13
.L3:
subl %eax, %r12d
movl %eax, %edx
vmovapd -112(%rbp), %zmm1
vpbroadcastw %r12d, %xmm0
leaq 0(,%rdx,8), %rbx
vpcmpuw $6, .LC2(%rip), %xmm0, %k1
vmovapd b(,%rdx,8), %zmm0{%k1}{z}
kmovb %k1, -113(%rbp)
call _ZGVeN8vv_pow
kmovb -113(%rbp), %k1
vmovapd %zmm0, a(%rbx){%k1}
so we simply call _ZGVeN8vv_pow without any masking applied. That's
possibly OK since we use zero-masking and thus actual masked argument
lanes are zero but it seems this isn't the expected behavior for
vectorizable_simd_clone_call. Instead it should probably unconditionally
set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) to false?
Is there a way to query which SIMD clone is "happy" with zero arguments
and thus for example with -ffast-math would be OK to run unmasked?