[Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release

juzhe.zhong at rivai dot ai via Gcc-bugs Fri, 22 Dec 2023 01:05:03 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113112


            Bug ID: 113112
           Summary: RISC-V: Dynamic LMUL feature stabilization for GCC-14
                    release
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Created attachment 56922
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56922&action=edit
dynamic LMUL fail case

Hi, as we known that we have supported dynamic LMUL feature but not stable.

As far as I known, we only have these 2 execution FAILs:
FAIL: gcc.dg/pr30957-1.c execution test
FAIL: gcc.dg/signbit-5.c execution test
in full coverage testing. And they are not the real FAIL.
Tests need to be adjusted.

And I have tested on K230 and other hardware, turns out we will have over 30%
performance improvement (compare with default LMUL = M1) for various benchmark
if we can select reasonable big
LMUL (no additional registers spillings).

However, I also find that there are some benchmarks have significantly
performance
drop (compare with default LMUL = M1) when using dynamic LMUL.
I am pretty sure because we pick the wrong big LMUL (LMUL>1) which causes
additional register spillings then we have bad performance for such situations.

For example:

#include <stdint-gcc.h>

#define N 40

int a[N];

__attribute__ ((noinline)) int
foo (int n){
  int i,j;
  int sum,x;

  for (i = 0; i < n; i++) {
    sum = 0;
    for (j = 0; j < n; j++) {
      sum += (i + j);
    }
    a[i] = sum;
  }
  return 0;
}

-march=rv64gcv -mabi=lp64d -O3 --param riscv-autovec-lmul=dynamic --param
riscv-autovec-preference=fixed-vlmax

ASM:

foo:
        ble     a0,zero,.L11
        lui     a2,%hi(.LANCHOR0)
        addi    sp,sp,-128
        addi    a2,a2,%lo(.LANCHOR0)
        mv      a1,a0
        vsetvli a6,zero,e32,m8,ta,ma
        vid.v   v8
        vs8r.v  v8,0(sp)
.L3:
        vl8re32.v       v16,0(sp)
        vsetvli a4,a1,e8,m2,ta,ma
        li      a3,0
        vsetvli a5,zero,e32,m8,ta,ma
        vmv8r.v v0,v16
        vmv.v.x v8,a4
        vmv.v.i v24,0
        vadd.vv v8,v16,v8
        vmv8r.v v16,v24
        vs8r.v  v8,0(sp)
.L4:
        addiw   a3,a3,1
        vadd.vv v8,v0,v16
        vadd.vi v16,v16,1
        vadd.vv v24,v24,v8
        bne     a0,a3,.L4
        vsetvli zero,a4,e32,m8,ta,ma
        sub     a1,a1,a4
        vse32.v v24,0(a2)
        slli    a4,a4,2
        add     a2,a2,a4
        bne     a1,zero,.L3
        li      a0,0
        addi    sp,sp,128
        jr      ra
.L11:
        li      a0,0
        ret

As we can see, pick up LMUL = 8 then spills.

This case is found by this following code I add into mov pattern:

      if (known_gt (GET_MODE_SIZE (mode), BYTES_PER_RISCV_VECTOR)
          && riscv_autovec_lmul == RVV_DYNAMIC && lra_in_progress)
        gcc_unreachable ();


The attachment is the file shows the cases that we pick up incorrect too big
LMUL which cause addiontial spillings.

I will work on this issue in the following days.

[Bug c/113112] New: RISC-V: Dynamic LMUL feature stabilization for GCC-14 release

Reply via email to