> -----Original Message-----
> From: H.J. Lu <[email protected]>
> Sent: Saturday, May 9, 2026 7:57 AM
> To: GCC Patches <[email protected]>; Uros Bizjak
> <[email protected]>; Liu, Hongtao <[email protected]>
> Subject: [PATCH] x86_cse: Check CONST0_RTX and CONSTM1_RTX for
> X86_CSE_VEC_DUP
> 
> Check CONST0_RTX and CONSTM1_RTX when placing
> 
> (insn 32 2 7 2 (set (reg:V2DI 114)
>         (const_vector:V2DI [
>                 (const_int 0 [0]) repeated x2
>             ])) -1
>      (nil))
> 
> after
> 
> (note 2 3 32 2 NOTE_INSN_FUNCTION_BEG)
> 
> for X86_CSE_VEC_DUP, not X86_CSE_CONST0_VECTOR or
> X86_CSE_CONSTM1_VECTOR, after replacing redundant vector loads:
> 
> (insn 31 15 16 2 (set (reg/v/f:DI 99 [ d ])
>         (const_int 0 [0])) "x.c":5:16 -1
>      (nil))
> ...
> (insn 18 17 19 2 (set (reg:V2DI 111 [ _22 ])
>         (vec_duplicate:V2DI (reg/v/f:DI 99 [ d ]))) "x.c":5:16 9345 
> {*vec_dupv2di}
>      (nil))
> 
> ...
> (insn 29 12 15 2 (set (reg/v/f:DI 98 [ c ])
>         (const_int 0 [0])) "x.c":5:16 -1
>      (nil))
> ...
> (insn 20 19 21 2 (set (reg:V2DI 112 [ _20 ])
>         (vec_duplicate:V2DI (reg/v/f:DI 98 [ c ]))) "x.c":5:16 9345 
> {*vec_dupv2di}
>      (nil))
> 
> with
> 
> (insn 18 17 19 2 (set (reg:V2DI 111 [ _22 ])
>         (reg:V2DI 114)) "x.c":5:16 2454 {movv2di_internal}
>      (nil))
> 
> and
> 
> (insn 20 19 21 2 (set (reg:V2DI 112 [ _20 ])
>         (reg:V2DI 114)) "x.c":5:16 2454 {movv2di_internal}
>      (nil))
> 
> gcc/
> 
> PR target/125239
> * config/i386/i386-features.cc (ix86_place_single_vector_set):
> Check CONST0_RTX and CONSTM1_RTX for X86_CSE_VEC_DUP.

Can we detect it in ix86_broadcast_inner, set *kind_p to X86_CSE_CONST0_VECTOR, 
instead of handle it in ix86_place_single_vector_set.

Also, I wonder why pass_combine(or fwprop) doesn't catch this miss 
optimization. Set with CONST0_VECTOR should be cheaper than with vec_duplicate.

> 
> gcc/testsuite/
> 
> PR target/125239
> * gcc.target/i386/pr125239.c: New test.
> 
> 
> --
> H.J.

Reply via email to