https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104946
Bug ID: 104946
Summary: [12 regression] Suboptimal gimple foding for blendvpd
under sse4.1
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: crazylht at gmail dot com
Target Milestone: ---
When working on PR104666, i found
cat test.c
typedef double __m128d __attribute__((__vector_size__(16), __may_alias__));
__m128d sse4_1_blendvpd (__m128d a, __m128d b, __m128d c)
__attribute__((__target__("sse4.1")));
__m128d
generic_blendvpd (__m128d a, __m128d b, __m128d c)
{
return __builtin_ia32_blendvpd (a, b, c);
}
gcc -O2 -msse4.1 -mno-sse4.2
generic_blendvpd:
movq rax, xmm2
movapd xmm3, xmm0
test rax, rax
jns .L3
movapd xmm0, xmm1
.L3:
pextrq rax, xmm2, 1
unpckhpd xmm3, xmm3
test rax, rax
jns .L5
unpckhpd xmm1, xmm1
movapd xmm3, xmm1
.L5:
unpcklpd xmm0, xmm3
ret
It's because it pcmpgtq is under sse4.2 w/o which vec_cmpv2di will be lower to
scalar operations and not combined back.
w/ sse4.2 gcc can generate optimal code.
generic_blendvpd:
movapd xmm3, xmm0
movdqa xmm0, xmm2
blendvpd xmm3, xmm1, xmm0
movapd xmm0, xmm3
ret