Thanks Robin for comments. > Where does the actual HI->SI extension happen then? No chance we see it > during combine/late-combine?
There is no HI->SI extension from the 272.expand and combine dump, so there is no change in RTL or It is unsafe here. I think we need additional fix to make it work, may related to int promotion I guess. > first :) Oops, will fix the typo. > Technically, shouldn't it be the other way around? Like first extend and > then > broadcast? In theory yes, may be I am over-considering here. If it is equality, bring the data to vector first may be has better opportunities in rvv env. It is totally to keep original sematics if it is useless or incorrect. Pan -----Original Message----- From: Robin Dapp <rdapp....@gmail.com> Sent: Saturday, September 13, 2025 1:02 PM To: Li, Pan2 <pan2...@intel.com>; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp....@gmail.com; Chen, Ken <ken.c...@intel.com>; Liu, Hongtao <hongtao....@intel.com> Subject: Re: [PATCH v1 1/4] RISC-V: Combine vec_duplicate + vwaddu.vv to vwaddu.vx on GR2VR cost Hi Pan, > The pattern of this patch only works on DImode, aka below pattern. > v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode) > + (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode)); > > Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss > this extend op after expand. > > For uint16_t => uint32_t we have: > (set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0)) > > For uint32_t => uint64_t we have: > (set (reg:DI 148 [ _6 ]) > (zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0))) > > We can see there is no zero_extend for uint16_t to uint32_t, and we > cannot hit the pattern above. So the combine will try below pattern > for uint16_t to uint32_t. > > v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode) > + (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode))) > > But it cannot match the vwaddu sematics, thus we need another handing > for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to > uint16_t. Where does the actual HI->SI extension happen then? No chance we see it during combine/late-combine? > diff --git a/gcc/config/riscv/autovec-opt.md > b/gcc/config/riscv/autovec-opt.md > index 02f19bc6a42..fefd2dc63c3 100644 > --- a/gcc/config/riscv/autovec-opt.md > +++ b/gcc/config/riscv/autovec-opt.md > @@ -1868,6 +1868,50 @@ (define_insn_and_split "*mul_minus_vx_<mode>" > } > [(set_attr "type" "vimuladd")]) > > +(define_insn_and_split "*widen_frist_<any_extend:su>_vx_<mode>" first :) > + [(set (match_operand:VWEXTI_D 0 "register_operand") > + (vec_duplicate:VWEXTI_D > + (any_extend:<VEL> > + (match_operand:<VSUBEL> 1 "register_operand"))))] > + "TARGET_VECTOR && can_create_pseudo_p ()" > + "#" > + "&& 1" > + [(const_int 0)] > + { > + machine_mode d_trunc_mode = <V_DOUBLE_TRUNC>mode; > + rtx vec_dup = gen_reg_rtx (d_trunc_mode); > + insn_code icode = code_for_pred_broadcast (d_trunc_mode); > + rtx vec_dup_ops[] = {vec_dup, operands[1]}; > + riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, > vec_dup_ops); > + > + icode = code_for_pred_vf2 (<any_extend:CODE>, <MODE>mode); > + rtx extend_ops[] = {operands[0], vec_dup}; > + riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, > extend_ops); Technically, shouldn't it be the other way around? Like first extend and then broadcast? -- Regards Robin