Re: [PATCH] RISC-V: Add per-type reduction costs to the vector cost model

Robin Dapp Wed, 06 May 2026 04:05:03 -0700

Hi Wang Yaduo,

> Add per-type reduction costs (i8/i16/i32/i64/f16/f32/f64) to the RISC-V
> vector cost model, distinguishing between ordered (fold-left) and
> unordered (tree) floating-point reductions.  When a reduction is
> detected, the per-type cost replaces the default vec_to_scalar_cost,
> similar to AArch64.  This causes _Float16 n=4 ordered reductions to no
> longer be vectorized in VLS mode due to the higher cost.
>
> gcc/ChangeLog:
>
>       * config/riscv/riscv-protos.h (common_vector_cost): Add per-type
>       reduction cost fields: reduc_i8_cost, reduc_i16_cost,
>       reduc_i32_cost, reduc_i64_cost, reduc_f16_cost, reduc_f32_cost,
>       reduc_f64_cost for unordered reductions, and reduc_f16_ordered_cost,
>       reduc_f32_ordered_cost, reduc_f64_ordered_cost for ordered
>       (fold-left) reductions.
>       * config/riscv/riscv.cc (rvv_vla_vector_cost): Initialize reduction
>       cost fields with default values.
>       (rvv_vls_vector_cost): Likewise.
>       * config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost): Add
>       reduction detection in the vec_to_scalar case.  When a reduction is
>       detected, replace the default vec_to_scalar_cost with the
>       appropriate per-type reduction cost based on element mode and
>       reduction kind (ordered vs unordered).
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/riscv/rvv/autovec/reduc/reduc_cost-1.c: New test for
>       VLA unordered reduction costs.
>       * gcc.target/riscv/rvv/autovec/reduc/reduc_cost-2.c: New test for
>       VLA ordered reduction costs.
>       * gcc.target/riscv/rvv/autovec/vls/reduc_cost-1.c: New test for
>       VLS reduction costs.
>       * gcc.target/riscv/rvv/autovec/vls/reduc-19.c: Update expected
>       vfredosum count from 9 to 8.
>       * gcc.target/riscv/rvv/autovec/vls/wred-3.c: Update expected
>       vfwredosum count from 17 to 16.
>
> Signed-off-by: Wang Yaduo <[email protected]>
> ---
>  gcc/config/riscv/riscv-protos.h               | 20 +++++-
>  gcc/config/riscv/riscv-vector-costs.cc        | 68 ++++++++++++++++++-
>  gcc/config/riscv/riscv.cc                     | 20 ++++++
>  .../riscv/rvv/autovec/reduc/reduc_cost-1.c    | 34 ++++++++++
>  .../riscv/rvv/autovec/reduc/reduc_cost-2.c    | 34 ++++++++++
>  .../riscv/rvv/autovec/vls/reduc-19.c          |  4 +-
>  .../riscv/rvv/autovec/vls/reduc_cost-1.c      | 41 +++++++++++
>  .../gcc.target/riscv/rvv/autovec/vls/wred-3.c |  4 +-
>  8 files changed, 219 insertions(+), 6 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_cost-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_cost-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/reduc_cost-1.c
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index dd029c704..5da5a6a21 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -279,6 +279,24 @@ struct common_vector_cost
>  
>    /* Cost of an unaligned vector store.  */
>    const int unalign_store_cost;
> +
> +  /* Cost of vector reduction operations (unordered / tree reduction).
> +     Indexed by element type.  */
> +  const int reduc_i8_cost;
> +  const int reduc_i16_cost;
> +  const int reduc_i32_cost;
> +  const int reduc_i64_cost;
> +  const int reduc_f16_cost;
> +  const int reduc_f32_cost;
> +  const int reduc_f64_cost;


Do we need all of those?  I'm not sure but given that they are supposed to be 
implemented as tree reductions, the latency should not vary too much WRT the 
element size?

> +
> +  /* Cost of ordered (fold-left / strict) floating-point reductions.
> +     These are significantly more expensive than unordered (tree) reductions
> +     because RVV ordered reduction instructions (e.g. vfredosum) process
> +     elements sequentially.  */
> +  const int reduc_f16_ordered_cost;
> +  const int reduc_f32_ordered_cost;
> +  const int reduc_f64_ordered_cost;

Same here, I'm not entirely sure and uarchs might vary (wildly) but generally 
these should scale linearly with the number of elements so perhaps once factor 
is enough?  Open for debate, though.

>  /* scalable vectorization (VLA) specific cost.  */
> @@ -289,7 +307,7 @@ struct scalable_vector_cost : common_vector_cost
>    {}
>  
>    /* TODO: We will need more other kinds of vector cost for VLA.
> -     E.g. fold_left reduction cost, lanes load/store cost, ..., etc.  */
> +     E.g. lanes load/store cost, ..., etc.  */
>  };

We have lane cost, so this comment can be removed.

> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -415,6 +415,16 @@ static const common_vector_cost rvv_vls_vector_cost = {
>    1, /* align_store_cost  */
>    2, /* unalign_load_cost  */
>    2, /* unalign_store_cost  */
> +  2, /* reduc_i8_cost  */
> +  2, /* reduc_i16_cost  */
> +  2, /* reduc_i32_cost  */
> +  2, /* reduc_i64_cost  */
> +  2, /* reduc_f16_cost  */
> +  2, /* reduc_f32_cost  */
> +  2, /* reduc_f64_cost  */
> +  6, /* reduc_f16_ordered_cost  */
> +  4, /* reduc_f32_ordered_cost  */
> +  2, /* reduc_f64_ordered_cost  */
>  };

Any reason why the scaling is not *2 but rather +2?  I'd have expected twice 
the work (and thus, latency) for 2x elements.  Also, even 2-6 seem rather low 
compared to regular reductions?  Looking at the published Ascalon X numbers, 
it's more like 5, 10, 20.


> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_cost-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_cost-1.c

Distinct cost-model tests are better put into the costmodel sub directory.

>  #include "wred-2.c"

> -/* { dg-final { scan-assembler-times {vfwredosum\.vs} 17 } } */
> +/* The _Float16->float n=4 case is not vectorized because the ordered
> +   reduction cost makes it unprofitable for small trip counts.  */
> +/* { dg-final { scan-assembler-times {vfwredosum\.vs} 16 } } */

This is supposed to test functionality so I'd rather keep the expectation and 
add -fno-vect-cost-model.

-- 
Regards
 Robin

Re: [PATCH] RISC-V: Add per-type reduction costs to the vector cost model

Reply via email to