RE: [PATCH] Improve vector increment/decrement on x86.

Liu, Hongtao Thu, 14 May 2026 18:46:51 -0700

> -----Original Message-----
> From: Roger Sayle <[email protected]>
> Sent: Friday, May 15, 2026 5:23 AM
> To: 'GCC Patches' <[email protected]>
> Cc: 'Hongtao Liu' <[email protected]>; Liu, Hongtao
> <[email protected]>; 'Uros Bizjak' <[email protected]>
> Subject: [PATCH] Improve vector increment/decrement on x86.
> 
> 
> This patch improves the code generated by the i386 backend for incrementing
> (adding one to) and decrementing (subtracting one from) a vector.  With SSE
> materializing the vector -1 is more efficient than materializing the vector 
> +1,
> hence x + 1 (increment) is better expressed as x - (-1), and x - 1 
> (decrement) is
> better expressed as x + (-1).  Conveniently the relevant additions and
> subtractions are specified as a single pattern, using a plusminus iterator, 
> in the
> machine description.

Can we add pre_reload define_insn_and_split for them,

(set (reg:V16QI 100 [ _2 ])
    (minus:V16QI (reg:V16QI 107 [ x ])
        (const_vector:V16QI [
                (const_int 1 [0x1]) repeated x16
            ])))

Theoretically, it should be able to capture more optimization opportunities (if 
vector +/-1 is only exposed through RTL optimization)


> 
> For the four example functions:
> 
> typedef char v16sqi __attribute__ ((vector_size(16))); typedef unsigned char
> v16uqi __attribute__ ((vector_size(16)));
> 
> v16sqi sadd1(v16sqi x) { return x+1; }
> v16uqi uadd1(v16uqi x) { return x+1; }
> v16sqi saddm1(v16sqi x) { return x-1; }
> v16uqi uaddm1(v16uqi x) { return x-1; }
> 
> GCC with -O2 -mavx2 previously generated:
> 
> sadd1:  vpcmpeqd        %xmm1, %xmm1, %xmm1
>         vpabsb  %xmm1, %xmm1
>         vpaddb  %xmm1, %xmm0, %xmm0
>         ret
> 
> uadd1:  vpcmpeqd        %xmm1, %xmm1, %xmm1
>         vpabsb  %xmm1, %xmm1
>         vpaddb  %xmm1, %xmm0, %xmm0
>         ret
> 
> saddm1: vpcmpeqd        %xmm1, %xmm1, %xmm1
>         vpabsb  %xmm1, %xmm1
>         vpsubb  %xmm1, %xmm0, %xmm0
>         ret
> 
> uaddm1: vpcmpeqd        %xmm1, %xmm1, %xmm1
>         vpaddb  %xmm1, %xmm0, %xmm0
>         ret
> 
> With this patch, we now consistently generate:
> 
> sadd1:  vpcmpeqd        %xmm1, %xmm1, %xmm1
>         vpsubb  %xmm1, %xmm0, %xmm0
>         ret
> 
> uadd1:  vpcmpeqd        %xmm1, %xmm1, %xmm1
>         vpsubb  %xmm1, %xmm0, %xmm0
>         ret
> 
> saddm1: vpcmpeqd        %xmm1, %xmm1, %xmm1
>         vpaddb  %xmm1, %xmm0, %xmm0
>         ret
> 
> uaddm1: vpcmpeqd        %xmm1, %xmm1, %xmm1
>         vpaddb  %xmm1, %xmm0, %xmm0
>         ret
> 
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32} with no
> new failures.  Ok for mainline?
> 
> 
> 2026-05-14  Roger Sayle  <[email protected]>
> 
> gcc/ChangeLog
>         * config/i386/sse.md (<plusminus><mode>3): Accept a CONST_VECTOR
>         as the second operand.  If the second operand is CONST1_RTX,
>         canonicalize to use CONSTM1_RTX instead.
> 
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/avx512f-simd-1.c: Tweak test case.
>         * gcc.target/i386/sse2-paddb-2.c: New test case.
>         * gcc.target/i386/sse2-paddd-2.c: Likewise.
>         * gcc.target/i386/sse2-paddw-2.c: Likewise.
>         * gcc.target/i386/sse2-psubb-2.c: Likewise.
>         * gcc.target/i386/sse2-psubd-2.c: Likewise.
>         * gcc.target/i386/sse2-psubw-2.c: Likewise.
> 
> 
> Thanks in advance,
> Roger
> --
RE: [PATCH] Improve vector increment/decrement on x86.

Reply via email to