RE: [PATCH 1/3][GCC16-Stage-1] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

Li, Pan2 Fri, 18 Apr 2025 08:29:16 -0700

Thanks Jeff for comments.

> So we've got 3 patches all touching on the same basic area, so we need 
> to be careful about staging in.


Agree, thanks Jeff for paying attention.

> So don't be surprised if most review time is focused on how the costing 
> model works since that's a common theme across these patches.

I see, Robin also mentioned this last year.

> This feels very much like a hack to me.  Why not just handle 
> VEC_DUPLICATE like the rest of the opcodes in riscv_rtx_costs? 
> Ultimately all that code needs to work together rather than having 
> separate paths for vector/scalar.

The idea is to separate vector related code into another sub function
for readability, instead of unroll the vector cost logic in riscv_rtx_costs. 
Given the
riscv_rtx_costs function body is quite long already. Currently we may have
Vec_dup but it may introduce more cases.

I am totally ok if we think unroll the logic into riscv_rtx_costs is a better 
choice.

> I don't particularly like how this interacts with costing.  Rather than 
> testing GR2VR like you've done, the better solution is to add the 
> appropriate code to riscv_rtx_costs to actually cost the vector insn as 
> a whole.  Part of that cost computation would be to look at the operand 
> and do something sensible with it (which would likely reference the cost 
> to move data between units such as GR2VR).  If you do that, then the 
> standard costing tests in combine should get used.

Make sense to me, make cost related things into riscv_rtx_costs.
I will remove the get_vector_costs ()->regmove->GR2VR check and send v2 if no 
surprise from test.

> Did you plan to support any other binary ops?  ISTM that any binary op 
> that provides a .vx form could be handled here.

Yes, all binary ops (except widen and narrow) should go here after this series. 
With this we could support other binary ops more easily and efficient.

Pan

-----Original Message-----
From: Jeff Law <[email protected]> 
Sent: Friday, April 18, 2025 10:34 PM
To: Li, Pan2 <[email protected]>; [email protected]
Cc: [email protected]; [email protected]; [email protected]; Chen, Ken 
<[email protected]>; Liu, Hongtao <[email protected]>; Paul-Antoine Arras 
<[email protected]>; Alexandre Oliva <[email protected]>
Subject: Re: [PATCH 1/3][GCC16-Stage-1] RISC-V: Combine vec_duplicate + vadd.vv 
to vadd.vx on GR2VR cost



On 4/17/25 1:30 AM, [email protected] wrote:
> From: Pan Li <[email protected]>
> 
> This patch would like to combine the vec_duplicate + vadd.vv to the
> vadd.vx.  From example as below code.  The related pattern will depend
> on the cost of vec_duplicate from GR2VR, it will:
> 
> * The pattern matching will be inactive if GR2VR cost is zero.
> * The cost of GR2VR will be added to the total cost of pattern, and
>    the late-combine will decide to perform the replacement or not
>    based on the cost value.
[ ... ]
So we've got 3 patches all touching on the same basic area, so we need 
to be careful about staging in.

Alex's patch is mostly focused on improving pred_broadcast and is 
probably independent except perhaps potentially wanting to use costing 
to guide code generation.

Paul-Antoine's patch is focused on multiply-add type instructions.  So 
no real conflicts on the patterns, but there's likely overlap on costing 
models.

And finally this patch which focuses on simple integer arithmetic and 
some costing model work

So don't be surprised if most review time is focused on how the costing 
model works since that's a common theme across these patches.



> 
> gcc/ChangeLog:
> 
>       * config/riscv/autovec-opt.md (*<optab>_vx_<mode>): Add new
>       combine to convert vec_duplicate + vadd.vv to vaddvx on GR2VR
>       cost.
>       * config/riscv/riscv.cc (riscv_rtx_costs): Extract vector
>       cost into a separated func.
>       (riscv_vector_rtx_costs): Add new func to take care of the
>       cost of vector rtx, default to 1 and append GR2VR cost to
>       vec_duplicate rtx.
>       * config/riscv/vector-iterators.md: Add new iterator for vx.
> 
> Signed-off-by: Pan Li <[email protected]>
> ---
>   gcc/config/riscv/autovec-opt.md      | 22 ++++++++++++++++++++++
>   gcc/config/riscv/riscv.cc            | 26 ++++++++++++++++++++------
>   gcc/config/riscv/vector-iterators.md |  4 ++++
>   3 files changed, 46 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 0c3b0cc7e05..ab45fe2511b 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -1673,3 +1673,25 @@ (define_insn_and_split "*vandn_<mode>"
>       DONE;
>     }
>     [(set_attr "type" "vandn")])
> +
> +;; 
> =============================================================================
> +;; Combine vec_duplicate + op.vv to op.vx
> +;; Include
> +;; - vadd.vx
> +;; 
> =============================================================================
> +(define_insn_and_split "*<optab>_vx_<mode>"
> + [(set (match_operand:V_VLSI    0 "register_operand")
> +       (any_int_binop_no_shift_vx:V_VLSI
> +      (vec_duplicate:V_VLSI
> +        (match_operand:<VEL> 1 "register_operand"))
> +      (match_operand:V_VLSI  2 "<binop_rhs2_predicate>")))]
> +  "TARGET_VECTOR && can_create_pseudo_p () && get_vector_costs 
> ()->regmove->GR2VR != 0"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +  {
> +    rtx ops[] = {operands[0], operands[2], operands[1]};
> +    riscv_vector::emit_vlmax_insn (code_for_pred_scalar (<CODE>, <MODE>mode),
> +                                riscv_vector::BINARY_OP, ops);
> +  }
> +  [(set_attr "type" "vialu")])
So there's a bit of a natural tension between generating better code 
early and waiting for combine in this space.   As we've discussed 
broadcasting across the vector in the loop header may perform better 
than using a .vx/.vf form in a loop where we have to pay the cost to 
transfer data between units.

We could attack this problem in the expander and that would be my strong 
preference in the scalar space.  But it's less clear for vector, in 
particular it's unclear if we'd perform LICM and broadcast the value 
across the vector in a loop pre-header if we've already embedded the 
value as a GPR/FPR in the vector instruction.

I don't particularly like how this interacts with costing.  Rather than 
testing GR2VR like you've done, the better solution is to add the 
appropriate code to riscv_rtx_costs to actually cost the vector insn as 
a whole.  Part of that cost computation would be to look at the operand 
and do something sensible with it (which would likely reference the cost 
to move data between units such as GR2VR).  If you do that, then the 
standard costing tests in combine should get used.



> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 38f3ae7cd84..9bd0dbcf5f6 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3856,16 +3856,30 @@ riscv_extend_cost (rtx op, bool unsigned_p)
>   #define SINGLE_SHIFT_COST 1
>   
>   static bool
> -riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno 
> ATTRIBUTE_UNUSED,
> -              int *total, bool speed)
> +riscv_vector_rtx_costs (rtx x, machine_mode mode, int *total)
>   {
> +  gcc_assert (riscv_v_ext_mode_p (mode));
> +
>     /* TODO: We set RVV instruction cost as 1 by default.
>        Cost Model need to be well analyzed and supported in the future. */
> +  int cost_val = 1;
> +  enum rtx_code rcode = GET_CODE (x);
> +
> +  /* Aka (vec_duplicate:RVVM1DI (reg/v:DI 143 [ x ]))  */
> +  if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0))))
> +    cost_val += get_vector_costs ()->regmove->GR2VR;
> +
> +  *total = COSTS_N_INSNS (cost_val);
> +
> +  return true;
> +}
> +
> +static bool
> +riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno 
> ATTRIBUTE_UNUSED,
> +              int *total, bool speed)
> +{
>     if (riscv_v_ext_mode_p (mode))
> -    {
> -      *total = COSTS_N_INSNS (1);
> -      return true;
> -    }
> +    return riscv_vector_rtx_costs (x, mode, total);
This feels very much like a hack to me.  Why not just handle 
VEC_DUPLICATE like the rest of the opcodes in riscv_rtx_costs? 
Ultimately all that code needs to work together rather than having 
separate paths for vector/scalar.


>   
> +(define_code_iterator any_int_binop_no_shift_vx
> + [plus
> +])
> +
Did you plan to support any other binary ops?  ISTM that any binary op 
that provides a .vx form could be handled here.

Jeff

RE: [PATCH 1/3][GCC16-Stage-1] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

Reply via email to