Hi Zhongyao,
generally OK (and useful!), just minor nits:
> +/* Return TRUE if we should prefer agnostic vector code, FALSE otherwise.
"if we should use the tail agnostic and mask agnostic policies for vector code,
false otherwise."
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 4fe0ae6d9..fa4d21125 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1144,9 +1144,9 @@ public:
> dflags |= demand_flags::DEMAND_LMUL_P;
> }
>
> - if (!m_ta)
> + if (!m_ta || riscv_prefer_agnostic_p ())
> dflags |= demand_flags::DEMAND_TAIL_POLICY_P;
> - if (!m_ma)
> + if (!m_ma || riscv_prefer_agnostic_p ())
> dflags |= demand_flags::DEMAND_MASK_POLICY_P;
Could you add a comment above that says something like "only merge demands if
the uarch does not have a preference"?
> +/* { dg-final { scan-assembler-times {vsetvli} 4 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118945-2.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118945-2.c
> new file mode 100644
> index 000000000..9b9844446
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr118945-2.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rva23u64 -mtune=generic-ooo -Ofast -S
> -fno-schedule-insns -fno-schedule-insns2" } */
> +
> +void vmult(
> + double* dst,
> + const double* src,
> + const unsigned int* rowstart,
> + const unsigned int* colnums,
> + const double* val,
> + const unsigned int n_rows
> +) {
> + const double* val_ptr = &val[rowstart[0]];
> + const unsigned int* colnum_ptr = &colnums[rowstart[0]];
> + double* dst_ptr = dst;
> +
> + for (unsigned int row = 0; row < n_rows; ++row) {
> + double s = 0.;
> + const double* const val_end_of_row = &val[rowstart[row + 1]];
> + while (val_ptr != val_end_of_row) {
> + s += *val_ptr++ * src[*colnum_ptr++];
> + }
> + *dst_ptr++ = s;
> + }
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli} 5 } } */
Is it possible to refine the test so it checks for the agnostic/non-undisturbed
mask explicitly. Or even check for specific .vsetvl dump output? Then we
might not even need to disable scheduling.
--
Regards
Robin