Oh. I see I think I have done wrong here.
I should adjust cost for VEC_EXTRACT not VEC_SET.
But it's odd, I didn't see loop vectorizer is scanning scalar_to_vec
cost in vect.dump.
The vect tree:
# a.4_25 = PHI <1(2), _4(11)>
# ivtmp_30 = PHI <18(2), ivtmp_20(11)>
# vect_vec_iv_.10_137 = PHI <{ 1, 2, 3, ... }(2), vect_vec_iv_.10_137(11)>
# ivtmp_149 = PHI <0(2), ivtmp_150(11)>
# loop_len_146 = PHI <18(2), _155(11)>
vect_patt_28.11_139 = (vector([2048,2048]) unsigned short)
vect_vec_iv_.10_137;
_22 = (int) a.4_25;
vect_patt_26.12_141 = MIN_EXPR <vect_patt_28.11_139, { 15, ... }>;
vect_patt_10.13_143 = { 32872, ... } >> vect_patt_26.12_141;
_12 = 32872 >> _22;
vect_patt_27.14_144 = VIEW_CONVERT_EXPR<vector([2048,2048]) short
int>(vect_patt_10.13_143);
b_7 = (short int) _12;
_4 = a.4_25 + 1;
ivtmp_20 = ivtmp_30 - 1;
ivtmp_150 = ivtmp_149 + POLY_INT_CST [2048, 2048];
_153 = MIN_EXPR <ivtmp_150, 18>;
_154 = 18 - _153;
_155 = MIN_EXPR <_154, POLY_INT_CST [2048, 2048]>;
if (_155 != 0)
goto <bb 11>; [0.00%]
else
goto <bb 16>; [100.00%]
<bb 16> [local count: 118111600]:
# vect_patt_27.14_145 = PHI <vect_patt_27.14_144(8)>
# loop_len_156 = PHI <loop_len_146(8)>
_147 = loop_len_156 + 18446744073709551615;
_148 = .VEC_EXTRACT (vect_patt_27.14_145, _147);
b_5 = _148;
a = 19;
_14 = b_5 != 0;
_15 = (int) _14;
return _15;
The vect dump tree only compute cost include vector_stmt and scalar_to_vec.
It seems it didn't consider VEC_EXTRACT cost ?
[email protected]
From: Richard Biener
Date: 2024-01-11 17:18
To: Juzhe-Zhong
CC: gcc-patches; kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Increase scalar_to_vec_cost from 1 to 3
On Thu, Jan 11, 2024 at 9:24 AM Juzhe-Zhong <[email protected]> wrote:
>
> This patch fixes the following inefficient vectorized codes:
>
> vsetvli a5,zero,e8,mf2,ta,ma
> li a2,17
> vid.v v1
> li a4,-32768
> vsetvli zero,zero,e16,m1,ta,ma
> addiw a4,a4,104
> vmv.v.i v3,15
> lui a1,%hi(a)
> li a0,19
> vsetvli zero,zero,e8,mf2,ta,ma
> vadd.vx v1,v1,a2
> sb a0,%lo(a)(a1)
> vsetvli zero,zero,e16,m1,ta,ma
> vzext.vf2 v2,v1
> vmv.v.x v1,a4
> vminu.vv v2,v2,v3
> vsrl.vv v1,v1,v2
> vslidedown.vi v1,v1,1
> vmv.x.s a0,v1
> snez a0,a0
> ret
>
> The reason is scalar_to_vec_cost is too low.
>
> Consider in VEC_SET, we always have a slide + scalar move instruction,
> scalar_to_vec_cost = 1 (current cost) is not reasonable.
scalar_to_vec is supposed to model a splat of GPR/FPR to a vector register.
We probably want to overhaul the cost classes, esp. vec_to_scalar, but of
course not now.
> I tried to set it as 2 but failed fix this case, that is, I need to
> set it as 3 to fix this case.
>
> No matter scalar move or slide instruction, I believe they are more costly
> than normal vector instructions (e.g. vadd.vv). So set it as 3 looks
> reasonable
> to me.
>
> After this patch:
>
> lui a5,%hi(a)
> li a4,19
> sb a4,%lo(a)(a5)
> li a0,0
> ret
>
> Tested on both RV32/RV64 no regression, Ok for trunk ?
>
> PR target/113281
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc: Set scalar_to_vec_cost as 3.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/pr113209.c: Adapt test.
> * gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c: New test.
>
> ---
> gcc/config/riscv/riscv.cc | 4 ++--
> .../vect/costmodel/riscv/rvv/pr113281-1.c | 18 ++++++++++++++++++
> .../gcc.target/riscv/rvv/autovec/pr113209.c | 2 +-
> 3 files changed, 21 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index df9799d9c5e..bcfb3c15a39 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -366,7 +366,7 @@ static const common_vector_cost rvv_vls_vector_cost = {
> 1, /* gather_load_cost */
> 1, /* scatter_store_cost */
> 1, /* vec_to_scalar_cost */
> - 1, /* scalar_to_vec_cost */
> + 3, /* scalar_to_vec_cost */
> 1, /* permute_cost */
> 1, /* align_load_cost */
> 1, /* align_store_cost */
> @@ -382,7 +382,7 @@ static const scalable_vector_cost rvv_vla_vector_cost = {
> 1, /* gather_load_cost */
> 1, /* scatter_store_cost */
> 1, /* vec_to_scalar_cost */
> - 1, /* scalar_to_vec_cost */
> + 3, /* scalar_to_vec_cost */
> 1, /* permute_cost */
> 1, /* align_load_cost */
> 1, /* align_store_cost */
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c
> b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c
> new file mode 100644
> index 00000000000..331cf961a1f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr113281-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3 -ftree-vectorize
> -fdump-tree-vect-details" } */
> +
> +unsigned char a;
> +
> +int main() {
> + short b = a = 0;
> + for (; a != 19; a++)
> + if (a)
> + b = 32872 >> a;
> +
> + if (b == 0)
> + return 0;
> + else
> + return 1;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113209.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113209.c
> index 081ee369394..70aae151000 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113209.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113209.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3" } */
> +/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64d -O3
> -fno-vect-cost-model" } */
>
> int b, c, d, f, i, a;
> int e[1] = {0};
> --
> 2.36.3
>