https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122846

            Bug ID: 122846
           Summary: risc-v rvv widening operations would perform better
                    with a wider LMUL
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bergner at oss dot tenstorrent.com
  Target Milestone: ---

The following simplified test case taken from a benchmark shows an example
where using a larger LMUL value would improve performance.  The problem here is
that for any widening operations, the end result gets the max LMUL value (here
LMUL=default=1) and the operations that feed into that result must use smaller
LMUL values.  In this case, we use mf4 for our vector loads!  It would be
better to use LMUL=1 for the loads and then use larger LMUL values for the
widening operations.

linux~$ cat test.c
int
foo (const char *x, const char *y)
{
  int sum = 0;
  for (int i = 0; i < 1024; i++)
    sum += x[i] * y[i];
  return sum;
}

linux~$ gcc -S -O2 -march=rv64imv test.c
linux~ cat test.s
[snip]
foo:
.LFB0:
        .cfi_startproc
        vsetivli        zero,4,e32,m1,ta,ma
        vmv.v.i v1,0
        addi    a5,a0,1024
.L2:
        vsetvli zero,zero,e8,mf4,ta,ma
        vle8.v  v3,0(a1)
        vle8.v  v4,0(a0)
        addi    a0,a0,4
        addi    a1,a1,4
        vwmul.vv        v2,v4,v3
        vmv1r.v v3,v1
        vsetvli zero,zero,e16,mf2,ta,ma
        vwadd.wv        v1,v3,v2
        bne     a5,a0,.L2
        vsetvli zero,zero,e32,m1,ta,ma
        vmv.s.x v2,zero
        vredsum.vs      v1,v1,v2
        vmv.x.s a0,v1
        ret

Reply via email to