> -----Original Message-----
> From: Roger Sayle <[email protected]>
> Sent: Friday, May 15, 2026 5:23 AM
> To: 'GCC Patches' <[email protected]>
> Cc: 'Hongtao Liu' <[email protected]>; Liu, Hongtao
> <[email protected]>; 'Uros Bizjak' <[email protected]>
> Subject: [PATCH] Improve vector increment/decrement on x86.
>
>
> This patch improves the code generated by the i386 backend for incrementing
> (adding one to) and decrementing (subtracting one from) a vector. With SSE
> materializing the vector -1 is more efficient than materializing the vector
> +1,
> hence x + 1 (increment) is better expressed as x - (-1), and x - 1
> (decrement) is
> better expressed as x + (-1). Conveniently the relevant additions and
> subtractions are specified as a single pattern, using a plusminus iterator,
> in the
> machine description.
Can we add pre_reload define_insn_and_split for them,
(set (reg:V16QI 100 [ _2 ])
(minus:V16QI (reg:V16QI 107 [ x ])
(const_vector:V16QI [
(const_int 1 [0x1]) repeated x16
])))
Theoretically, it should be able to capture more optimization opportunities (if
vector +/-1 is only exposed through RTL optimization)
>
> For the four example functions:
>
> typedef char v16sqi __attribute__ ((vector_size(16))); typedef unsigned char
> v16uqi __attribute__ ((vector_size(16)));
>
> v16sqi sadd1(v16sqi x) { return x+1; }
> v16uqi uadd1(v16uqi x) { return x+1; }
> v16sqi saddm1(v16sqi x) { return x-1; }
> v16uqi uaddm1(v16uqi x) { return x-1; }
>
> GCC with -O2 -mavx2 previously generated:
>
> sadd1: vpcmpeqd %xmm1, %xmm1, %xmm1
> vpabsb %xmm1, %xmm1
> vpaddb %xmm1, %xmm0, %xmm0
> ret
>
> uadd1: vpcmpeqd %xmm1, %xmm1, %xmm1
> vpabsb %xmm1, %xmm1
> vpaddb %xmm1, %xmm0, %xmm0
> ret
>
> saddm1: vpcmpeqd %xmm1, %xmm1, %xmm1
> vpabsb %xmm1, %xmm1
> vpsubb %xmm1, %xmm0, %xmm0
> ret
>
> uaddm1: vpcmpeqd %xmm1, %xmm1, %xmm1
> vpaddb %xmm1, %xmm0, %xmm0
> ret
>
> With this patch, we now consistently generate:
>
> sadd1: vpcmpeqd %xmm1, %xmm1, %xmm1
> vpsubb %xmm1, %xmm0, %xmm0
> ret
>
> uadd1: vpcmpeqd %xmm1, %xmm1, %xmm1
> vpsubb %xmm1, %xmm0, %xmm0
> ret
>
> saddm1: vpcmpeqd %xmm1, %xmm1, %xmm1
> vpaddb %xmm1, %xmm0, %xmm0
> ret
>
> uaddm1: vpcmpeqd %xmm1, %xmm1, %xmm1
> vpaddb %xmm1, %xmm0, %xmm0
> ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32} with no
> new failures. Ok for mainline?
>
>
> 2026-05-14 Roger Sayle <[email protected]>
>
> gcc/ChangeLog
> * config/i386/sse.md (<plusminus><mode>3): Accept a CONST_VECTOR
> as the second operand. If the second operand is CONST1_RTX,
> canonicalize to use CONSTM1_RTX instead.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/avx512f-simd-1.c: Tweak test case.
> * gcc.target/i386/sse2-paddb-2.c: New test case.
> * gcc.target/i386/sse2-paddd-2.c: Likewise.
> * gcc.target/i386/sse2-paddw-2.c: Likewise.
> * gcc.target/i386/sse2-psubb-2.c: Likewise.
> * gcc.target/i386/sse2-psubd-2.c: Likewise.
> * gcc.target/i386/sse2-psubw-2.c: Likewise.
>
>
> Thanks in advance,
> Roger
> --