https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123238

            Bug ID: 123238
           Summary: [15/16 Regression] poorer code for conditional select
                    (VCOND vs VCOND_MASK)
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

void f(char c[])
{
    for (int i = 0; i < 8; i++)
        c[i] = c[i] ? 'a' : 'c';
}

gcc-14 -O2:
f:
        movq    xmm0, QWORD PTR [rdi]
        pxor    xmm1, xmm1
        movq    xmm2, QWORD PTR .LC0[rip]
        pcmpeqb xmm0, xmm1
        movq    xmm1, QWORD PTR .LC1[rip]
        pand    xmm1, xmm0
        pandn   xmm0, xmm2
        por     xmm0, xmm1
        movq    QWORD PTR [rdi], xmm0
        ret

gcc-15 -O2:
f:
        movq    xmm0, QWORD PTR [rdi]
        pxor    xmm1, xmm1
        movq    xmm2, QWORD PTR .LC1[rip]
        pcmpeqb xmm0, xmm1
        pcmpeqb xmm0, xmm1
        movq    xmm1, QWORD PTR .LC0[rip]
        pand    xmm1, xmm0
        pandn   xmm0, xmm2
        por     xmm0, xmm1
        movq    QWORD PTR [rdi], xmm0
        ret

Note the extra pcmpeqb that inverts the result of the comparison.

In gcc-15 we have

  vect__3.8_30 = MEM <vector(8) char> [(char *)c_8(D)];
  mask__12.9_31 = vect__3.8_30 != { 0, 0, 0, 0, 0, 0, 0, 0 };
  vect_iftmp.10_32 = .VCOND_MASK (mask__12.9_31, { 97, 97, 97, 97, 97, 97, 97,
97 }, { 99, 99, 99, 99, 99, 99, 99, 99 });
  MEM <vector(8) char> [(char *)c_8(D)] = vect_iftmp.10_32;

while in gcc-14 we had

  vect__3.8_30 = MEM <vector(8) char> [(char *)c_8(D)];
  vect_iftmp.10_35 = .VCOND (vect__3.8_30, { 0, 0, 0, 0, 0, 0, 0, 0 }, { 97,
97, 97, 97, 97, 97, 97, 97 }, { 99, 99, 99, 99, 99, 99, 99, 99 }, 115);
  MEM <vector(8) char> [(char *)c_8(D)] = vect_iftmp.10_35;

Also, the straightforward translation via pand-pandn-por is not optimal. For

d = c ? a : b

where a and b are constant, we want to use the xor trick and evaluate d as

b XOR ((c CMP 0) AND (a XOR B))

(and even when a XOR b cannot be folded, xor trick is not worse than
pand-pandn-por)

Reply via email to