On Tue, 17 Feb 2026, Roger Sayle wrote:

> 
> Perhaps the easiest way to demonstrate that tree-ssa's isel pass isn't a
> replacement for GCC's RTL expansion pass is with a concrete example where
> ISEL hurt's performance (on x86_64), which should be unsurprising given
> that gimple-isel.cc doesn't once mention rtx_costs (or cost).
> 
> Consider the example:
> 
> void foo(char c[])
> {
>     for (int i = 0; i < 16; i++)
>         c[i] = c[i] != 'a';
> }
> 
> currently when compiled with -O2 -mavx2 this generates:
> 
> foo:    movl    $1633771873, %eax
>         vpxor   %xmm1, %xmm1, %xmm1
>         vmovd   %eax, %xmm0
>         vpbroadcastd    %xmm0, %xmm0
>         vpcmpeqb        (%rdi), %xmm0, %xmm0
>         vpcmpeqb        %xmm1, %xmm0, %xmm0
>         vpcmpeqd        %xmm1, %xmm1, %xmm1
>         vpabsb  %xmm1, %xmm1
>         vpand   %xmm1, %xmm0, %xmm0
>         vmovdqu %xmm0, (%rdi)
>         ret
> 
> with the attached patch, when applied on top of the previously
> posted https://gcc.gnu.org/pipermail/gcc-patches/2026-February/708351.html
> we generate the improved:
> 
>         movl    $1633771873, %eax
>         vpxor   %xmm1, %xmm1, %xmm1
>         vmovd   %eax, %xmm0
>         vpbroadcastd    %xmm0, %xmm0
>         vpcmpeqb        (%rdi), %xmm0, %xmm0
>         vpcmpeqb        %xmm1, %xmm0, %xmm0
>         vpabsb  %xmm0, %xmm0
>         vmovdqu %xmm0, (%rdi)
>         ret
> 
> The difference is that to convert a vector of 0 and -1 values to a
> vector of 0 and 1 values, we don't use AND as in "cond & {1,1,1,1...}"
> but can use (in this case) ABS or a vector logical right shift when
> available.  Clearly using vpabsb is faster, as the materialization of
> the vector "{1,1,1,1,1...}" already uses vpabsb, before the vpand.
> 
> Unfortunately, the i386-expand.cc change (which understands the various
> instruction availabilities and implicit costs) on its own is insufficent,
> because isel's gimple_expand_vec_cond_expr blindly lowers IFN_VCOND_MASK
> without letting expand or the target backend decide on the best possible
> implementation.  The patch removes these premature optimizations (the
> root of all evil).  Aside, I suspect that one cause for confusion is the
> poor naming; the "isel" pass has little to do with "instruction selection",
> so perhaps internal-fn-lowering or similar would be better.  Even the
> comment at the top of gimple-isel describes it as "Schedule GIMPLE
> vector statements".  Perhaps once tree-ssa has a way of querying the
> backend for instruction costs things will improve, but until then RTL
> expansion makes far more sense.

Indeed the lack of efficient costing is a problem here.  The goal
of ISEL is to replace "TER" which basically mimics building
larger GENERIC expressions from GIMPLE so RTL expansion sees the
complex expressions it saw before we had GIMPLE.  "TER" has the
issue that it is applied blindly (not only where it makes a difference
for RTL expansion), causing random scheduling of statements.  And
it is interacting with SSA colaescing which can inhibit some scheduling.

The goal of ISEL is to produce an instruction stream of
trivially RTL expandable instructions by selecting optabs up-front.
As RTL costing requires full build RTXen this creates inefficiencies
(see how IVOPTs does costing for example).  If costing were available
based on optabs that would be an improvement (we could possibly
at least cache it on that level somehow).

Of course we do not want to replace all GIMPLE operations by
direct optab function calls, and GIMPLE cannot currently handle
multi-defs.  So the immediate goal would be to make TER obsolete
only, which main complication is the RTL expansion of loads, stores
and conditionals ...

> This patch has been tested (on top of the patch mentioned above) on
> x86_64-pc-linux-gnu with make bootstrap and make -k check, both with
> and without --target_board=unix{-m32} with no new failures.
> Thoughts?  (Both) Ok for stage1?

As Andrew said this will regress the targets this was added for.
I'd rather see an attempt to add some form of costing here.

Richard.

> 
> 2026-02-17  Roger Sayle  <[email protected]>
> 
> gcc/ChangeLog
>         * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Optimize
>         case where op_false is a vector of zeros, and op_true is a vector
>         of ones, using either vector logical right shifts or vector ABS.
>         * gimple-isel.cc (gimple_expand_vec_cond_expr): Always lower
>         VEC_COND_EXPR to IFN_VCOND_MASK.  Remove the "optimization" of
>         special cases as these are best performed (by the backend) during
>         RTL expansion.
> 
> 
> Roger
> --
> 
> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to