https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #10 from Levy <admin at levyhsu dot com> ---
Created attachment 49500
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49500&action=edit
Optimzation Patch for QI/HImode(32bit) and QI/HI/SImode(64bit)

Proposing second patch for QI/HImode(32bit) and QI/HI/SImode(64bit)
Both Zero-Extend & Subreg

Tested with make report-gcc
Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c

Both were failed due to dejaGNU rule:
/* { dg-final { scan-assembler "load1r:\n\taddi" } } */

But if we look at the assembly code, for same input in both file:

int load1r (int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return a;
}

Current gcc risc-v port will produce:
load1r:
        addi    a5,a0,768
        lw      a4,36(a5)
        lw      a0,32(a5)
        addw    a0,a0,a4
        lw      a4,40(a5)
        addw    a4,a4,a0
        lw      a0,44(a5)
        addw    a0,a0,a4
        ret
Patched new port will produce:
load1r:
        lwu     a4,800(a0)
        lwu     a5,804(a0)
        addw    a5,a5,a4
        lwu     a4,808(a0)
        lwu     a0,812(a0)
        addw    a5,a5,a4
        addw    a0,a5,a0
        ret
With one instruction less, the patched one seems right and even faster to me.
However we still need a test on sign extend and check performance

Test case and performance evaluation will be provided later (hopefully)

Reply via email to