On Thu, Feb 12, 2026 at 12:20 AM Roger Sayle <[email protected]> wrote:
>
>
> This patch implements Alexander Monakov's suggestion from PR 123238.
> Traditionally, the x86_64 backend implements VCOND_MASK using a three
> instruction sequence of pand, pandn and por (requiring three registers),
> however when op_true and op_false are both constant vectors, this can
> be done using just two instructions, pand and pxor (requiring only two
> registers).  This requires delaying forcing const_vector operands to
> memory (the constant pool) as late as possible, including changing the
> predicates on the define_expand patterns that call ix86_expand_sse_movcc
> to (consistently) accept vector_or_const_vector_operand.

I wonder why simplify-rtx doesn't eventually pick this up?  We should
have REG_EQUAL notes exposing the CONST_VECTORs?  But maybe
I'm dreaming up RTL features here ;)

>
> void f(char c[])
> {
>     for (int i = 0; i < 8; i++)
>         c[i] = c[i] ? 'a' : 'c';
> }
>
> Before with -O2 (11 instructions):
> f:      movq    (%rdi), %xmm0
>         pxor    %xmm1, %xmm1
>         movq    .LC1(%rip), %xmm2       // {'c','c','c'...}
>         pcmpeqb %xmm1, %xmm0
>         pcmpeqb %xmm1, %xmm0
>         movq    .LC0(%rip), %xmm1       // {'a','a','a'...}
>         pand    %xmm0, %xmm1
>         pandn   %xmm2, %xmm0
>         por     %xmm1, %xmm0
>         movq    %xmm0, (%rdi)
>         ret
>
> After with -O2 (10 instructions):
> f:      movq    (%rdi), %xmm0
>         pxor    %xmm1, %xmm1
>         pcmpeqb %xmm1, %xmm0
>         pcmpeqb %xmm1, %xmm0
>         movq    .LC2(%rip), %xmm1       // {2,2,2...}
>         pand    %xmm1, %xmm0
>         movq    .LC1(%rip), %xmm1       // {'c','c','c'...}
>         pxor    %xmm1, %xmm0
>         movq    %xmm0, (%rdi)
>         ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for stage1?  I'm posting this now so the
> suggestion doesn't get lost, if/when PR 123238 is closed after the
> regression is fixed.
>
>
> 2026-02-11  Roger Sayle  <[email protected]>
>
> gcc/ChangeLog
>         PR target/123238
>         * config/i386/i386-expand.cc: Delay calling force_reg on
>         op_true and op_false.  Generate an AND the XOR sequence
>         if op_true and op_false are both CONST_VECTOR_P.
>         * config/i386/mmx.md (vcond_mask_<mode>v4hi): Allow operands
>         1 and 2 to be vector_or_const_vector_operand.
>         (vcond_mask_<mode>v2hi): Likewise.
>         (vcond_mask_<mode><mmxintvecmodelower>): Likewise.
>         (vcond_mask_<mode><mode>): Likewise.
>         * config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
>         Likewise.
>         (vcond_mask_<mode><sseintvecmodelower>): Likewise.
>         (vcond_mask_v1tiv1ti): Likewise.
>         (vcond_mask_<mode><sseintvecmodelower>): Likewise.
>         (vcond_mask_<mode><sseintvecmodelower>): Likewise.
>
> gcc/testsuite/ChangeLog
>         PR target/123238
>         * gcc.target/i386/pr123238-2.c: New test case.
>
>
> Roger
> --
>

Reply via email to