https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123238
Bug ID: 123238
Summary: [15/16 Regression] poorer code for conditional select
(VCOND vs VCOND_MASK)
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: amonakov at gcc dot gnu.org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
void f(char c[])
{
for (int i = 0; i < 8; i++)
c[i] = c[i] ? 'a' : 'c';
}
gcc-14 -O2:
f:
movq xmm0, QWORD PTR [rdi]
pxor xmm1, xmm1
movq xmm2, QWORD PTR .LC0[rip]
pcmpeqb xmm0, xmm1
movq xmm1, QWORD PTR .LC1[rip]
pand xmm1, xmm0
pandn xmm0, xmm2
por xmm0, xmm1
movq QWORD PTR [rdi], xmm0
ret
gcc-15 -O2:
f:
movq xmm0, QWORD PTR [rdi]
pxor xmm1, xmm1
movq xmm2, QWORD PTR .LC1[rip]
pcmpeqb xmm0, xmm1
pcmpeqb xmm0, xmm1
movq xmm1, QWORD PTR .LC0[rip]
pand xmm1, xmm0
pandn xmm0, xmm2
por xmm0, xmm1
movq QWORD PTR [rdi], xmm0
ret
Note the extra pcmpeqb that inverts the result of the comparison.
In gcc-15 we have
vect__3.8_30 = MEM <vector(8) char> [(char *)c_8(D)];
mask__12.9_31 = vect__3.8_30 != { 0, 0, 0, 0, 0, 0, 0, 0 };
vect_iftmp.10_32 = .VCOND_MASK (mask__12.9_31, { 97, 97, 97, 97, 97, 97, 97,
97 }, { 99, 99, 99, 99, 99, 99, 99, 99 });
MEM <vector(8) char> [(char *)c_8(D)] = vect_iftmp.10_32;
while in gcc-14 we had
vect__3.8_30 = MEM <vector(8) char> [(char *)c_8(D)];
vect_iftmp.10_35 = .VCOND (vect__3.8_30, { 0, 0, 0, 0, 0, 0, 0, 0 }, { 97,
97, 97, 97, 97, 97, 97, 97 }, { 99, 99, 99, 99, 99, 99, 99, 99 }, 115);
MEM <vector(8) char> [(char *)c_8(D)] = vect_iftmp.10_35;
Also, the straightforward translation via pand-pandn-por is not optimal. For
d = c ? a : b
where a and b are constant, we want to use the xor trick and evaluate d as
b XOR ((c CMP 0) AND (a XOR B))
(and even when a XOR b cannot be folded, xor trick is not worse than
pand-pandn-por)