On Sat, May 9, 2026 at 3:04 PM Hongtao Liu <[email protected]> wrote:
>
> On Sat, May 9, 2026 at 10:28 AM H.J. Lu <[email protected]> wrote:
> >
> > On Sat, May 9, 2026 at 10:20 AM H.J. Lu <[email protected]> wrote:
> > >
> > > On Sat, May 9, 2026 at 9:10 AM Liu, Hongtao <[email protected]> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: H.J. Lu <[email protected]>
> > > > > Sent: Saturday, May 9, 2026 7:57 AM
> > > > > To: GCC Patches <[email protected]>; Uros Bizjak
> > > > > <[email protected]>; Liu, Hongtao <[email protected]>
> > > > > Subject: [PATCH] x86_cse: Check CONST0_RTX and CONSTM1_RTX for
> > > > > X86_CSE_VEC_DUP
> > > > >
> > > > > Check CONST0_RTX and CONSTM1_RTX when placing
> > > > >
> > > > > (insn 32 2 7 2 (set (reg:V2DI 114)
> > > > > (const_vector:V2DI [
> > > > > (const_int 0 [0]) repeated x2
> > > > > ])) -1
> > > > > (nil))
> > > > >
> > > > > after
> > > > >
> > > > > (note 2 3 32 2 NOTE_INSN_FUNCTION_BEG)
> > > > >
> > > > > for X86_CSE_VEC_DUP, not X86_CSE_CONST0_VECTOR or
> > > > > X86_CSE_CONSTM1_VECTOR, after replacing redundant vector loads:
> > > > >
> > > > > (insn 31 15 16 2 (set (reg/v/f:DI 99 [ d ])
> > > > > (const_int 0 [0])) "x.c":5:16 -1
> > > > > (nil))
> > > > > ...
> > > > > (insn 18 17 19 2 (set (reg:V2DI 111 [ _22 ])
> > > > > (vec_duplicate:V2DI (reg/v/f:DI 99 [ d ]))) "x.c":5:16 9345
> > > > > {*vec_dupv2di}
> > > > > (nil))
> > > > >
> > > > > ...
> > > > > (insn 29 12 15 2 (set (reg/v/f:DI 98 [ c ])
> > > > > (const_int 0 [0])) "x.c":5:16 -1
> > > > > (nil))
> > > > > ...
> > > > > (insn 20 19 21 2 (set (reg:V2DI 112 [ _20 ])
> > > > > (vec_duplicate:V2DI (reg/v/f:DI 98 [ c ]))) "x.c":5:16 9345
> > > > > {*vec_dupv2di}
> > > > > (nil))
> > > > >
> > > > > with
> > > > >
> > > > > (insn 18 17 19 2 (set (reg:V2DI 111 [ _22 ])
> > > > > (reg:V2DI 114)) "x.c":5:16 2454 {movv2di_internal}
> > > > > (nil))
> > > > >
> > > > > and
> > > > >
> > > > > (insn 20 19 21 2 (set (reg:V2DI 112 [ _20 ])
> > > > > (reg:V2DI 114)) "x.c":5:16 2454 {movv2di_internal}
> > > > > (nil))
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR target/125239
> > > > > * config/i386/i386-features.cc (ix86_place_single_vector_set):
> > > > > Check CONST0_RTX and CONSTM1_RTX for X86_CSE_VEC_DUP.
> > > >
> > > > Can we detect it in ix86_broadcast_inner, set *kind_p to
> > > > X86_CSE_CONST0_VECTOR, instead of handle it in
> > > > ix86_place_single_vector_set.
> > >
> > > Done. I am testing this patch.
> >
> > The condition should be
> >
> > else if (CONST_VECTOR_P (src))
> > {
> > /* The only possible CONST_VECTORs of SRC are CONST0_RTX and
> > CONSTM1_RTX. Otherwise,
> >
> > rtx set = gen_rtx_SET (dest, src);
> >
> > won't be a valid instruction. */
> > machine_mode mode = GET_MODE (dest);
> > if (!((src == CONST0_RTX (mode)
> > && load->kind == X86_CSE_CONST0_VECTOR)
> > || (src == CONSTM1_RTX (mode)
> > && load->kind == X86_CSE_CONSTM1_VECTOR)))
> > gcc_unreachable ();
>
> I think for CONST_VECTOR size > UNITS_PER_WORD, we now constructed the
> const_vector(line and assigned it to broadcast_source)
>
> 4910 else
> 4911 {
> 4912 int nunits = GET_MODE_NUNITS (mode);
> 4913 rtvec v = rtvec_alloc (nunits);
> 4914 for (int j = 0; j < nunits ; j++)
> 4915 RTVEC_ELT (v, j) = load->val;
> 4916 broadcast_source = gen_rtx_CONST_VECTOR (mode, v);
> 4917 }
>
> And it will be passed to ix86_place_single_vector_set and cause
> invalid insn, it's a bug that needs to be fixed.
It shouldn't happen since
case X86_CSE_VEC_DUP:
if (CONST_INT_P (load->val)
&& (load->val == CONST0_RTX (load->mode)
|| load->size <= UNITS_PER_WORD))
{
/* Generate CONST_VECTOR load. */
case X86_CSE_CONST_VECTOR:
mode = ix86_get_vector_cse_mode (load->size,
load->mode);
if (CONST_VECTOR_P (load->val))
broadcast_source = load->val;
else if (load->val == CONST0_RTX (load->mode))
broadcast_source = CONST0_RTX (mode);
else if (load->val == CONSTM1_RTX (load->mode))
broadcast_source = CONSTM1_RTX (mode);
else
{
int nunits = GET_MODE_NUNITS (mode);
rtvec v = rtvec_alloc (nunits);
for (int j = 0; j < nunits ; j++)
RTVEC_ELT (v, j) = load->val;
broadcast_source = gen_rtx_CONST_VECTOR (mode, v);
}
For X86_CSE_VEC_DUP, it is either CONST0_RTX
or load->size <= UNITS_PER_WORD. For X86_CSE_CONST_VECTOR,
only native CONST_VECTOR is allowed.
>
>
> > }
> >
> > > > Also, I wonder why pass_combine(or fwprop) doesn't catch this miss
> > > > optimization. Set with CONST0_VECTOR should be cheaper than with
> > > > vec_duplicate.
> > >
> > > Because of -fno-tree-dse -fno-tree-dce?
> > >
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > > PR target/125239
> > > > > * gcc.target/i386/pr125239.c: New test.
> > > > >
> > > > >
> > > > > --
> > > > > H.J.
> > >
> > >
> > >
> > > --
> > > H.J.
> >
> >
> >
> > --
> > H.J.
>
>
>
> --
> BR,
> Hongtao
--
H.J.