> On Thu, Feb 12, 2026 at 12:20 AM Roger Sayle <[email protected]>
> wrote:
> > This patch implements Alexander Monakov's suggestion from PR 123238.
> > Traditionally, the x86_64 backend implements VCOND_MASK using a three
> > instruction sequence of pand, pandn and por (requiring three
> > registers), however when op_true and op_false are both constant
> > vectors, this can be done using just two instructions, pand and pxor
> > (requiring only two registers).  This requires delaying forcing
> > const_vector operands to memory (the constant pool) as late as
> > possible, including changing the predicates on the define_expand
> > patterns that call ix86_expand_sse_movcc to (consistently) accept
> vector_or_const_vector_operand.
> 
> I wonder why simplify-rtx doesn't eventually pick this up?  We should have
> REG_EQUAL notes exposing the CONST_VECTORs?  But maybe I'm dreaming up
> RTL features here ;)

A very reasonable question.  The answer (as often) is combine's instruction
limit, as this sequence theoretically requires a six instruction "combine".
The comparison, that generates the mask, is 1 instruction, negating it is
2 instructions, and the following vec_merge is three instructions.  So spotting
an inverted comparison followed by a vec_merge is just beyond combine's
capabilities.
 
One approach I looked at was to perhaps lower to x86 instructions later,
using define_insn_and_split; for example, one for negated comparisons and
another to emulate vblend/vec_merge, but these potentially interfere
with things like ternlog recognition, and cse of CONST0/CONST1 vectors.  

>> We should have REG_EQUAL notes exposing the CONST_VECTORs?
This is also on my todo list: lower CONST_VECTORs later in the x86_64 backend,
so that combine and the early optimizers see the "constant", not just the
sequence of instructions that materialize it.  It also gives STV the ability to
use constants, rather than emit sequences via expand.

> >
> > void f(char c[])
> > {
> >     for (int i = 0; i < 8; i++)
> >         c[i] = c[i] ? 'a' : 'c';
> > }
> >
> > Before with -O2 (11 instructions):
> > f:      movq    (%rdi), %xmm0
> >         pxor    %xmm1, %xmm1
> >         movq    .LC1(%rip), %xmm2       // {'c','c','c'...}
> >         pcmpeqb %xmm1, %xmm0
> >         pcmpeqb %xmm1, %xmm0
> >         movq    .LC0(%rip), %xmm1       // {'a','a','a'...}
> >         pand    %xmm0, %xmm1
> >         pandn   %xmm2, %xmm0
> >         por     %xmm1, %xmm0
> >         movq    %xmm0, (%rdi)
> >         ret
> >
> > After with -O2 (10 instructions):
> > f:      movq    (%rdi), %xmm0
> >         pxor    %xmm1, %xmm1
> >         pcmpeqb %xmm1, %xmm0
> >         pcmpeqb %xmm1, %xmm0
> >         movq    .LC2(%rip), %xmm1       // {2,2,2...}
> >         pand    %xmm1, %xmm0
> >         movq    .LC1(%rip), %xmm1       // {'c','c','c'...}
> >         pxor    %xmm1, %xmm0
> >         movq    %xmm0, (%rdi)
> >         ret
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for stage1?  I'm posting this now so the
> > suggestion doesn't get lost, if/when PR 123238 is closed after the
> > regression is fixed.
> >
> >
> > 2026-02-11  Roger Sayle  <[email protected]>
> >
> > gcc/ChangeLog
> >         PR target/123238
> >         * config/i386/i386-expand.cc: Delay calling force_reg on
> >         op_true and op_false.  Generate an AND the XOR sequence
> >         if op_true and op_false are both CONST_VECTOR_P.
> >         * config/i386/mmx.md (vcond_mask_<mode>v4hi): Allow operands
> >         1 and 2 to be vector_or_const_vector_operand.
> >         (vcond_mask_<mode>v2hi): Likewise.
> >         (vcond_mask_<mode><mmxintvecmodelower>): Likewise.
> >         (vcond_mask_<mode><mode>): Likewise.
> >         * config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
> >         Likewise.
> >         (vcond_mask_<mode><sseintvecmodelower>): Likewise.
> >         (vcond_mask_v1tiv1ti): Likewise.
> >         (vcond_mask_<mode><sseintvecmodelower>): Likewise.
> >         (vcond_mask_<mode><sseintvecmodelower>): Likewise.
> >
> > gcc/testsuite/ChangeLog
> >         PR target/123238
> >         * gcc.target/i386/pr123238-2.c: New test case.
> >
> >
> > Roger
> > --
> >

Reply via email to