On Wed, 13 Dec 2023, Juzhe-Zhong wrote: > Hi, before this patch, a simple conversion case for RVV codegen: > > foo: > ble a2,zero,.L8 > addiw a5,a2,-1 > li a4,6 > bleu a5,a4,.L6 > srliw a3,a2,3 > slli a3,a3,3 > add a3,a3,a0 > mv a5,a0 > mv a4,a1 > vsetivli zero,8,e16,m1,ta,ma > .L4: > vle8.v v2,0(a5) > addi a5,a5,8 > vzext.vf2 v1,v2 > vse16.v v1,0(a4) > addi a4,a4,16 > bne a3,a5,.L4 > andi a5,a2,-8 > beq a2,a5,.L10 > .L3: > slli a4,a5,32 > srli a4,a4,32 > subw a2,a2,a5 > slli a2,a2,32 > slli a5,a4,1 > srli a2,a2,32 > add a0,a0,a4 > add a1,a1,a5 > vsetvli zero,a2,e16,m1,ta,ma > vle8.v v2,0(a0) > vzext.vf2 v1,v2 > vse16.v v1,0(a1) > .L8: > ret > .L10: > ret > .L6: > li a5,0 > j .L3 > > This vectorization go through first loop: > > vsetivli zero,8,e16,m1,ta,ma > .L4: > vle8.v v2,0(a5) > addi a5,a5,8 > vzext.vf2 v1,v2 > vse16.v v1,0(a4) > addi a4,a4,16 > bne a3,a5,.L4 > > Each iteration processes 8 elements. > > For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = > 128. > But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, > e.g. VLEN = 256bits. > only half of the vector units are working and another half is idle. > > After investigation, I realize that I forgot to adjust COST for SELECT_VL. > So, adjust COST for SELECT_VL styple length vectorization. We adjust COST > from 3 to 2. since > after this patch: > > foo: > ble a2,zero,.L5 > .L3: > vsetvli a5,a2,e16,m1,ta,ma -----> SELECT_VL cost. > vle8.v v2,0(a0) > slli a4,a5,1 -----> additional shift of outcome > SELECT_VL for memory address calculation. > vzext.vf2 v1,v2 > sub a2,a2,a5 > vse16.v v1,0(a1) > add a0,a0,a5 > add a1,a1,a4 > bne a2,zero,.L3 > .L5: > ret > > This patch is a simple fix that I previous forgot. > > Ok for trunk ?
OK. Richard. > If not, I am going to adjust cost in backend cost model. > > PR target/111317 > > gcc/ChangeLog: > > * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust for > COST for decrement IV. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test. > > --- > .../gcc.dg/vect/costmodel/riscv/rvv/pr111317.c | 12 ++++++++++++ > gcc/tree-vect-loop.cc | 17 ++++++++++++++--- > 2 files changed, 26 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c > > diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c > b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c > new file mode 100644 > index 00000000000..d4bea242a9a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111317.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize > --param=riscv-autovec-lmul=m1" } */ > + > +void > +foo (char *__restrict a, short *__restrict b, int n) > +{ > + for (int i = 0; i < n; i++) > + b[i] = (short) a[i]; > +} > + > +/* { dg-final { scan-assembler-times > {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e16,\s*m1,\s*t[au],\s*m[au]} 1 } } */ > +/* { dg-final { scan-assembler-times {vsetvli} 1 } } */ > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index 6261cd1be1d..19e38b8637b 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -4870,10 +4870,21 @@ vect_estimate_min_profitable_iters (loop_vec_info > loop_vinfo, > if (partial_load_store_bias != 0) > body_stmts += 1; > > - /* Each may need two MINs and one MINUS to update lengths in body > - for next iteration. */ > + unsigned int length_update_cost = 0; > + if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) > + /* For decrement IV style, we use a single SELECT_VL since > + beginning to calculate the number of elements need to be > + processed in current iteration, and a SHIFT operation to > + compute the next memory address instead of adding vectorization > + factor. */ > + length_update_cost = 2; > + else > + /* For increment IV stype, Each may need two MINs and one MINUS to > + update lengths in body for next iteration. */ > + length_update_cost = 3; > + > if (need_iterate_p) > - body_stmts += 3 * num_vectors; > + body_stmts += length_update_cost * num_vectors; > } > > (void) add_stmt_cost (target_cost_data, prologue_stmts, > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)