Thanks Jeff for comments. > So we've got 3 patches all touching on the same basic area, so we need > to be careful about staging in.
Agree, thanks Jeff for paying attention. > So don't be surprised if most review time is focused on how the costing > model works since that's a common theme across these patches. I see, Robin also mentioned this last year. > This feels very much like a hack to me. Why not just handle > VEC_DUPLICATE like the rest of the opcodes in riscv_rtx_costs? > Ultimately all that code needs to work together rather than having > separate paths for vector/scalar. The idea is to separate vector related code into another sub function for readability, instead of unroll the vector cost logic in riscv_rtx_costs. Given the riscv_rtx_costs function body is quite long already. Currently we may have Vec_dup but it may introduce more cases. I am totally ok if we think unroll the logic into riscv_rtx_costs is a better choice. > I don't particularly like how this interacts with costing. Rather than > testing GR2VR like you've done, the better solution is to add the > appropriate code to riscv_rtx_costs to actually cost the vector insn as > a whole. Part of that cost computation would be to look at the operand > and do something sensible with it (which would likely reference the cost > to move data between units such as GR2VR). If you do that, then the > standard costing tests in combine should get used. Make sense to me, make cost related things into riscv_rtx_costs. I will remove the get_vector_costs ()->regmove->GR2VR check and send v2 if no surprise from test. > Did you plan to support any other binary ops? ISTM that any binary op > that provides a .vx form could be handled here. Yes, all binary ops (except widen and narrow) should go here after this series. With this we could support other binary ops more easily and efficient. Pan -----Original Message----- From: Jeff Law <jeffreya...@gmail.com> Sent: Friday, April 18, 2025 10:34 PM To: Li, Pan2 <pan2...@intel.com>; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp....@gmail.com; Chen, Ken <ken.c...@intel.com>; Liu, Hongtao <hongtao....@intel.com>; Paul-Antoine Arras <par...@baylibre.com>; Alexandre Oliva <ol...@adacore.com> Subject: Re: [PATCH 1/3][GCC16-Stage-1] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost On 4/17/25 1:30 AM, pan2...@intel.com wrote: > From: Pan Li <pan2...@intel.com> > > This patch would like to combine the vec_duplicate + vadd.vv to the > vadd.vx. From example as below code. The related pattern will depend > on the cost of vec_duplicate from GR2VR, it will: > > * The pattern matching will be inactive if GR2VR cost is zero. > * The cost of GR2VR will be added to the total cost of pattern, and > the late-combine will decide to perform the replacement or not > based on the cost value. [ ... ] So we've got 3 patches all touching on the same basic area, so we need to be careful about staging in. Alex's patch is mostly focused on improving pred_broadcast and is probably independent except perhaps potentially wanting to use costing to guide code generation. Paul-Antoine's patch is focused on multiply-add type instructions. So no real conflicts on the patterns, but there's likely overlap on costing models. And finally this patch which focuses on simple integer arithmetic and some costing model work So don't be surprised if most review time is focused on how the costing model works since that's a common theme across these patches. > > gcc/ChangeLog: > > * config/riscv/autovec-opt.md (*<optab>_vx_<mode>): Add new > combine to convert vec_duplicate + vadd.vv to vaddvx on GR2VR > cost. > * config/riscv/riscv.cc (riscv_rtx_costs): Extract vector > cost into a separated func. > (riscv_vector_rtx_costs): Add new func to take care of the > cost of vector rtx, default to 1 and append GR2VR cost to > vec_duplicate rtx. > * config/riscv/vector-iterators.md: Add new iterator for vx. > > Signed-off-by: Pan Li <pan2...@intel.com> > --- > gcc/config/riscv/autovec-opt.md | 22 ++++++++++++++++++++++ > gcc/config/riscv/riscv.cc | 26 ++++++++++++++++++++------ > gcc/config/riscv/vector-iterators.md | 4 ++++ > 3 files changed, 46 insertions(+), 6 deletions(-) > > diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md > index 0c3b0cc7e05..ab45fe2511b 100644 > --- a/gcc/config/riscv/autovec-opt.md > +++ b/gcc/config/riscv/autovec-opt.md > @@ -1673,3 +1673,25 @@ (define_insn_and_split "*vandn_<mode>" > DONE; > } > [(set_attr "type" "vandn")]) > + > +;; > ============================================================================= > +;; Combine vec_duplicate + op.vv to op.vx > +;; Include > +;; - vadd.vx > +;; > ============================================================================= > +(define_insn_and_split "*<optab>_vx_<mode>" > + [(set (match_operand:V_VLSI 0 "register_operand") > + (any_int_binop_no_shift_vx:V_VLSI > + (vec_duplicate:V_VLSI > + (match_operand:<VEL> 1 "register_operand")) > + (match_operand:V_VLSI 2 "<binop_rhs2_predicate>")))] > + "TARGET_VECTOR && can_create_pseudo_p () && get_vector_costs > ()->regmove->GR2VR != 0" > + "#" > + "&& 1" > + [(const_int 0)] > + { > + rtx ops[] = {operands[0], operands[2], operands[1]}; > + riscv_vector::emit_vlmax_insn (code_for_pred_scalar (<CODE>, <MODE>mode), > + riscv_vector::BINARY_OP, ops); > + } > + [(set_attr "type" "vialu")]) So there's a bit of a natural tension between generating better code early and waiting for combine in this space. As we've discussed broadcasting across the vector in the loop header may perform better than using a .vx/.vf form in a loop where we have to pay the cost to transfer data between units. We could attack this problem in the expander and that would be my strong preference in the scalar space. But it's less clear for vector, in particular it's unclear if we'd perform LICM and broadcast the value across the vector in a loop pre-header if we've already embedded the value as a GPR/FPR in the vector instruction. I don't particularly like how this interacts with costing. Rather than testing GR2VR like you've done, the better solution is to add the appropriate code to riscv_rtx_costs to actually cost the vector insn as a whole. Part of that cost computation would be to look at the operand and do something sensible with it (which would likely reference the cost to move data between units such as GR2VR). If you do that, then the standard costing tests in combine should get used. > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index 38f3ae7cd84..9bd0dbcf5f6 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -3856,16 +3856,30 @@ riscv_extend_cost (rtx op, bool unsigned_p) > #define SINGLE_SHIFT_COST 1 > > static bool > -riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno > ATTRIBUTE_UNUSED, > - int *total, bool speed) > +riscv_vector_rtx_costs (rtx x, machine_mode mode, int *total) > { > + gcc_assert (riscv_v_ext_mode_p (mode)); > + > /* TODO: We set RVV instruction cost as 1 by default. > Cost Model need to be well analyzed and supported in the future. */ > + int cost_val = 1; > + enum rtx_code rcode = GET_CODE (x); > + > + /* Aka (vec_duplicate:RVVM1DI (reg/v:DI 143 [ x ])) */ > + if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0)))) > + cost_val += get_vector_costs ()->regmove->GR2VR; > + > + *total = COSTS_N_INSNS (cost_val); > + > + return true; > +} > + > +static bool > +riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno > ATTRIBUTE_UNUSED, > + int *total, bool speed) > +{ > if (riscv_v_ext_mode_p (mode)) > - { > - *total = COSTS_N_INSNS (1); > - return true; > - } > + return riscv_vector_rtx_costs (x, mode, total); This feels very much like a hack to me. Why not just handle VEC_DUPLICATE like the rest of the opcodes in riscv_rtx_costs? Ultimately all that code needs to work together rather than having separate paths for vector/scalar. > > +(define_code_iterator any_int_binop_no_shift_vx > + [plus > +]) > + Did you plan to support any other binary ops? ISTM that any binary op that provides a .vx form could be handled here. Jeff