On Mon, Oct 20, 2025 at 5:59 PM Roger Sayle <[email protected]> wrote: > > > Hi Uros and H.J., > Here's an old patch that I never got around to posting due to stage > restrictions last year (or the year before). > > Currently x86_64's TImode STV pass has the restriction that candidate > chains must start with a TImode load from memory. This patch improves > the functionality of STV to allow zero-extensions and construction of > TImode pseudos from two DImode values (i.e. *concatditi) to both be > considered candidate chain initiators. For example, this allows chains > starting from an __int128 function argument to be processed by STV. > > Compiled with -O2 on x86_64: > > __int128 m0,m1,m2,m3; > void foo(__int128 m) > { > m0 = m; > m1 = m; > m2 = m; > m3 = m; > } > > Previously generated: > > foo: xchgq %rdi, %rsi > movq %rsi, m0(%rip) > movq %rdi, m0+8(%rip) > movq %rsi, m1(%rip) > movq %rdi, m1+8(%rip) > movq %rsi, m2(%rip) > movq %rdi, m2+8(%rip) > movq %rsi, m3(%rip) > movq %rdi, m3+8(%rip) > ret > > With the patch, we now generate: > > foo: movq %rdi, %xmm0 > movq %rsi, %xmm1 > punpcklqdq %xmm1, %xmm0 > movaps %xmm0, m0(%rip) > movaps %xmm0, m1(%rip) > movaps %xmm0, m2(%rip) > movaps %xmm0, m3(%rip) > ret > > or with -mavx2: > > foo: vmovq %rdi, %xmm1 > vpinsrq $1, %rsi, %xmm1, %xmm0 > vmovdqa %xmm0, m0(%rip) > vmovdqa %xmm0, m1(%rip) > vmovdqa %xmm0, m2(%rip) > vmovdqa %xmm0, m3(%rip) > ret > > Likewise, for zero-extension: > > __int128 m0,m1,m2,m3; > void bar(unsigned long x) > { > __int128 m = x; > m0 = m; > m1 = m; > m2 = m; > m3 = m; > } > > Previously with -O2: > > bar: movq %rdi, m0(%rip) > movq $0, m0+8(%rip) > movq %rdi, m1(%rip) > movq $0, m1+8(%rip) > movq %rdi, m2(%rip) > movq $0, m2+8(%rip) > movq %rdi, m3(%rip) > movq $0, m3+8(%rip) > ret > > with this patch: > > bar: movq %rdi, %xmm0 > movaps %xmm0, m0(%rip) > movaps %xmm0, m1(%rip) > movaps %xmm0, m2(%rip) > movaps %xmm0, m3(%rip) > ret > > > As shown in the examples above, the scalar-to-vector (STV) conversion of > *concatditi has an overhead [treating two DImode registers as a TImode > value is free on x86_64], but specifying this penalty allows the STV > pass to make an informed decision if the total cost/gain of the chain > is a net win. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2025-10-20 Roger Sayle <[email protected]> > > gcc/ChangeLog > * config/i386/i386-features.cc (timode_concatdi_p): New > function to recognize the various variants of *concatditi3_[1-7]. > function to determine the gain/cost on a CONST_WIDE_INT. > (scalar_chain::add_insn): Like VEC_SELECT, ZERO_EXTEND and > timode_concatdi_p instructions don't require their input > operands to be converted (to TImode). > (timode_scalar_chain::compute_convert_gain): Split/clone XOR and > IOR cases from AND case, to handle timode_concatdi_p costs. > <case PLUS>: Handle timode_concatdi_p conversion costs. > <case ZERO_EXTEND>: Provide costs of DImode to TImode extension. > (timode_convert_concatdi): Helper function to transform a > *concatditi3 instruction into a vec_concatv2di instruction. > (timode_scalar_chain::convert_insn): Split/clone XOR and IOR > cases from ANS case, to handle timode_concatdi_p using the new > timode_convert_concatdi helper function. > <case ZERO_EXTEND>: Convert zero_extendditi2 to *vec_concatv2di_0. > <case PLUS>: Handle timode_concatdi_p using the new > timode_convert_concatdi helper function. > (timode_scalar_to_vector_candidate_p): Support timode_concatdi_p > instructions in IOR, XOR and PLUS cases. > <case ZERO_EXTEND>: Consider zero extension of a register from > DImode to TImode to be a candidate. > > gcc/testsuite/ChangeLog > * gcc.target/i386/sse4_1-stv-10.c: New test case. > * gcc.target/i386/sse4_1-stv-11.c: Likewise. > * gcc.target/i386/sse4_1-stv-12.c: Likewise.
I didn't check gains in compute_convert_gain in detail, but they look reasonable, and you have much more experience here. As shown by attached testcases, this functionality is a nice addition to the STV pass. The patch is OK. Thanks, Uros.
