On Thu, May 7, 2026 at 2:04 PM H.J. Lu <[email protected]> wrote:
>
> On Wed, May 6, 2026 at 10:17 PM Hongtao Liu <[email protected]> wrote:
> >
> > On Mon, May 4, 2026 at 9:08 PM H.J. Lu <[email protected]> wrote:
> > >
> > > On Mon, May 4, 2026 at 8:36 PM Liu, Hongtao <[email protected]> wrote:
> > > >
> > > > Could you rebase the patch to latest trunk, I failed to git am your
> > > > patch.
> > > >
> > > > > }
> > > > >
> > > > >- /* NB: CONST_VECTOR load is generated and handled in x86_cse. */
> > > > >- if (load
> > > > >- && !CONST_VECTOR_P (src)
> > > > >- && load->kind == X86_CSE_VEC_DUP)
> > > > >+ if (load && load->kind == X86_CSE_VEC_DUP)
> > > >
> > > > It looks like the base is without your former patch[1], but the patch
> > > > should be incremental to that.
> > >
> > > Here is the rebased patch.
> > >
> > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2026-April/715240.html
> > > >
> > > > > -----Original Message-----
> > > > > From: H.J. Lu <[email protected]>
> > > > > Sent: Thursday, April 30, 2026 10:07 PM
> > > > > To: GCC Patches <[email protected]>; Uros Bizjak
> > > > > <[email protected]>; Liu, Hongtao <[email protected]>
> > > > > Subject: [PATCH] x86_cse: Add X86_CSE_CONST_VECTOR
> > > > >
> > > > > Add X86_CSE_CONST_VECTOR for native CONST_VECTOR:
> > > > >
> > > > > (insn 25 23 234 4 (set (reg:V16QI 135)
> > > > > (const_vector:V16QI [
> > > > > (const_int -1 [0xffffffffffffffff]) repeated x16
> > > > > ])) "bar-2.c":10:16 discrim 67584 2453 {movv16qi_internal}
> > > > > (nil))
> > > > >
> > > > > and constant integer load:
> > > > >
> > > > > (insn 280 8 279 2 (set (subreg:HI (reg:V2QI 172) 0)
> > > > > (const_int -1 [0xffffffffffffffff])) -1
> > > > > (nil))
> > > > > ...
> > > > > (insn 110 39 194 9 (set (reg:V2QI 147)
> > > > > (reg:V2QI 172)) 2089 {*movv2qi_internal}
> > > > > (expr_list:REG_EQUAL (const_vector:V2QI [
> > > > > (const_int -1 [0xffffffffffffffff]) repeated x2
> > > > > ])
> > > > > (nil)))
> > > > >
> > > > > converted from
> > > > >
> > > > > (insn 111 87 121 18 (set (reg:V2QI 147)
> > > > > (mem/u/c:V2QI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S2
> > > > > A16])) 2089 {*movv2qi_internal}
> > > > > (expr_list:REG_EQUAL (const_vector:V2QI [
> > > > > (const_int -1 [0xffffffffffffffff]) repeated x2])
> > > > > (nil)))
> > > > >
> > > > > Keep redundant constant integer load when crossing a function call
> > > > > since it is
> > > > > faster than save and restore an integer register.
> > > > >
> > > > > Convert CONST_VECTOR load no larger than integer register to constant
> > > > > integer load even if there is no redundant CONST_VECTOR load.
> > > > >
> > > > > Tested on Linux/x86-64 and Linux/i686.
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR target/125100
> > > > > * config/i386/i386-features.cc (x86_cse_kind): Add
> > > > > X86_CSE_CONST_VECTOR.
> > > > > (redundant_pattern): Add dest_mode.
> > > > > (ix86_place_single_vector_set): Handle X86_CSE_CONST_VECTOR.
> > > > > Generate SUBREG for constant integer source.
> > > > > (ix86_broadcast_inner): Add an INSN argument. Check REG_EQUAL notes
> > > > > for
> > > > > CONST_VECTOR. Set load kind to X86_CSE_CONST_VECTOR for native and
> > > > > converted CONST_VECTORs. Return CONST_VECTOR if it can be converted
> > > > > to
> > > > > constant integer load.
> > > > > (pass_x86_cse::candidate_vector_p): Add an INSN argument and pass the
> > > > > insn to ix86_broadcast_inner.
> > > > > (pass_x86_cse::x86_cse): Add a basic block bitmap for calls.
> > > > > Pass the insn to candidate_vector_p. Handle X86_CSE_CONST_VECTOR.
> > > > > Set dest_mode. Keep redundant constant integer load when crossing a
> > > > > function call. Convert CONST_VECTOR load no larger than integer
> > > > > register to
> > > > > constant integer load even if there is no redundant CONST_VECTOR load.
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > > PR target/125100
> > > > > * gcc.target/i386/pr125100-1.c: New test.
> > > > > * gcc.target/i386/pr125100-2.c: Likewise.
> > > > > * gcc.target/i386/pr125100-3.c: Likewise.
> > > > >
> > > > >
> > > > > --
> > > > > H.J.
> > >
> > >
> > >- else if (CONST_VECTOR_P (op))
> > >+ else
> > > {
> > >- rtx first = XVECEXP (op, 0, 0);
> > >- for (int i = 1; i < nunits; ++i)
> > >+ rtx equal;
> > >+ bool int_load_p = false;
> > >+ if (CONST_VECTOR_P (op))
> > > {
> > >- rtx tmp = XVECEXP (op, 0, i);
> > >- /* Vector duplicate value. */
> > >- if (!rtx_equal_p (tmp, first))
> > >- return nullptr;
> > >+ /* CONST_VECTOR is supported natively. */
> > >+ *kind_p = X86_CSE_CONST_VECTOR;
> > >+ int_load_p = GET_MODE_SIZE (mode) <= UNITS_PER_WORD;
> > >+ equal = op;
> > > }
> > >- /* Use the inner mode to handle
> > >- (const_vector:V2QI [(const_int 0 [0]) repeated x2])
> > >- */
> > >- *scalar_mode_p = GET_MODE_INNER (mode);
> > >- *insn_p = nullptr;
> > >- return first;
> > >+ else
> > >+ {
> > >+ /* Check CONST_VECTOR load which can be converted to constant
> > >+ integer load. */
> > >+ equal = find_reg_equal_equiv_note (*insn_p);
> >
> > Could you funnel non-CONST_VECTOR cases through REG_EQUAL, then run
> > one shared CONST_VECTOR path for int_load_p or const vec duplicate.
> > Also add some comments to ix86_broadcast_inner that the const_vector
> > will be returned if GET_MODE_SIZE (mode) <= UNITS_PER_WORD.
>
> Changed in the v2 patch to check REG_EQUAL note first.
>
> > ...
> >
> > >- if (!set && !CALL_P (insn))
> > >- continue;
> > >+ if (!set)
> > >+ {
> > >+ if (CALL_P (insn))
> > >+ bitmap_set_bit (call_bbs, BLOCK_FOR_INSN (insn)->index);
> >
> > There could be call_p (insn) but still a single set? and that
> > call_insn will be missed by upper code.
>
> Changed to
>
> bool call_p = CALL_P (insn);
> rtx set = single_set (insn);
> if (!set && !call_p)
> continue;
>
> tlsdesc_val = nullptr;
>
> attr_tls64 tls64 = get_attr_tls64 (insn);
>
> /* NB: TLS calls preserve all registers. */
> if (call_p && tls64 == TLS64_NONE)
> bitmap_set_bit (call_bbs, BLOCK_FOR_INSN (insn)->index);
>
> > >+ else
> > >+ continue;
> > >+ }
> >
> > ...
> >
> > >+ bool keep_redundant_load = false;
> >
> > When load->count == 1, I think we also want replace const_vector with
> > integer load, so better with
> > keep_redundant_load = load->count == 1;
>
> The variable name may be confusing since 1 constant integer load
> is also converted:
For load->count == 1, it converted with vector mode, not directly
integer load from imm
Replace:
(insn 58 40 64 9 (set (reg:V2QI 120)
(mem/u/c:V2QI (symbol_ref/u:SI ("*.LC1") [flags 0x2]) [0 S2
A16])) 2081 {*movv2qi_internal}
(expr_list:REG_EQUAL (const_vector:V2QI [
(const_int 1 [0x1]) repeated x2
])
(nil)))
with:
(insn 58 40 64 9 (set (reg:V2QI 120)
(reg:V2QI 131)) 2081 {*movv2qi_internal}
(expr_list:REG_EQUAL (const_vector:V2QI [
(const_int 1 [0x1]) repeated x2
])
(nil)))
deferring rescan insn with uid = 58.
deferring rescan insn with uid = 141.
Place:
(insn 141 40 58 9 (set (subreg:HI (reg:V2QI 131) 0)
(const_int 257 [0x101])) -1
(nil))
after:
(note 40 39 141 9 [bb 9] NOTE_INSN_BASIC_BLOCK)
Why shouldn't we directly replace it with
(insn 58 40 64 9 (set (subreg: HI(reg:V2QI 120) 0)
(const_int 257 [0x101]))
>
> /* Convert CONST_VECTOR load no larger than integer register
> to constant integer load even if there is no redundant
> CONST_VECTOR load. */
> if (CONST_VECTOR_P (val))
> threshold = 1;
>
> I renamed it to keep_const_int_load.
>
> > >+ if (!bitmap_empty_p (call_bbs))
> > >+ {
> > >+ bitmap_iterator bi;
> > >+ unsigned int id;
> > >+ EXECUTE_IF_SET_IN_BITMAP (load->bbs, 0, id, bi)
> > >+ if (bitmap_bit_p (call_bbs, id))
> > >+ {
> > >+ /* NB: Redundant constant integer load is
> > >+ faster than save and restore an integer
> > >+ register when crossing a function call.
> > >+ */
> > >+ keep_redundant_load = true;
> > >+ break;
> >
> > >
>
> I am testing this v2 patch. OK for master if there are no regressions?
>
> Thanks.
>
> --
> H.J.
> ---
> Add X86_CSE_CONST_VECTOR for native CONST_VECTOR:
>
> (insn 25 23 234 4 (set (reg:V16QI 135)
> (const_vector:V16QI [
> (const_int -1 [0xffffffffffffffff]) repeated x16
> ])) "bar-2.c":10:16 discrim 67584 2453 {movv16qi_internal}
> (nil))
>
> and constant integer load:
>
> (insn 280 8 279 2 (set (subreg:HI (reg:V2QI 172) 0)
> (const_int -1 [0xffffffffffffffff])) -1
> (nil))
> ...
> (insn 110 39 194 9 (set (reg:V2QI 147)
> (reg:V2QI 172)) 2089 {*movv2qi_internal}
> (expr_list:REG_EQUAL (const_vector:V2QI [
> (const_int -1 [0xffffffffffffffff]) repeated x2
> ])
> (nil)))
>
> converted from
>
> (insn 111 87 121 18 (set (reg:V2QI 147)
> (mem/u/c:V2QI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S2
> A16])) 2089 {*movv2qi_internal}
> (expr_list:REG_EQUAL (const_vector:V2QI [
> (const_int -1 [0xffffffffffffffff]) repeated x2])
> (nil)))
>
> 1. Use CONST_VECTOR in REG_EQUAL note to avoid DF chain.
> 2. Keep constant integer load when crossing a function call since it is
> faster than save and restore an integer register.
> 3. Convert CONST_VECTOR load no larger than integer register to constant
> integer load even if there is no redundant CONST_VECTOR load.
>
> Tested on Linux/x86-64 and Linux/i686.
>
> gcc/
>
> PR target/125100
> * config/i386/i386-features.cc (x86_cse_kind): Add
> X86_CSE_CONST_VECTOR.
> (redundant_pattern): Add dest_mode.
> (ix86_place_single_vector_set): Handle X86_CSE_CONST_VECTOR.
> Generate SUBREG for constant integer source.
> (ix86_broadcast_inner): Add an INSN argument. Use CONST_VECTOR
> in REG_EQUAL note. Set load kind to X86_CSE_CONST_VECTOR for
> native and converted CONST_VECTORs. Return CONST_VECTOR if it
> can be converted to constant integer load.
> (pass_x86_cse::candidate_vector_p): Add an INSN argument and
> pass the insn to ix86_broadcast_inner.
> (pass_x86_cse::x86_cse): Add a basic block bitmap for calls.
> Pass the insn to candidate_vector_p. Handle X86_CSE_CONST_VECTOR.
> Set dest_mode. Keep constant integer load when crossing a
> function call. Convert CONST_VECTOR load no larger than integer
> register to constant integer load even if there are no redundant
> CONST_VECTOR loads.
>
> gcc/testsuite/
>
> PR target/125100
> * gcc.target/i386/pr125100-1.c: New test.
> * gcc.target/i386/pr125100-2.c: Likewise.
> * gcc.target/i386/pr125100-3.c: Likewise.
> * gcc.target/i386/pr125100-4.c: Likewise.
--
BR,
Hongtao