On 1/25/25 1:07 AM, pan2...@intel.com wrote:
From: Pan Li <pan2...@intel.com>

After we enabled the labe-combine pass after the mode-switching pass, it
will try to combine below insn patterns into op.  Aka:

(insn 40 5 41 2 (set (reg:SI 11 a1 [151])
   (reg:SI 69 frm)) "pr118103-simple.c":67:15 2712 {frrmsi}
   (nil))
(insn 41 40 7 2 (set (reg:SI 69 frm)
   (const_int 2 [0x2])) "pr118103-simple.c":69:8 2710 {fsrmsi_restore}
   (nil))
(insn 42 10 11 2 (set (reg:SI 69 frm)
   (reg:SI 11 a1 [151])) "pr118103-simple.c":70:8 2710 {fsrmsi_restore}
     (nil))

trying to combine definition of r11 in:
40: a1:SI=frm:SI
     into:
42: frm:SI=a1:SI
     instruction becomes a no-op:
(set (reg:SI 69 frm)
(reg:SI 69 frm))
original cost = 4 + 4 (weighted: 8.000000), replacement cost =
2147483647; keeping replacement
rescanning insn with uid = 42.
updating insn 42 in-place
verify found no changes in insn with uid = 42.
deleting insn 40

For example we have code as blow:
    9   │ int test_exampe () {
   10   │   test ();
   11   │
   12   │   size_t vl = 4;
   13   │   vfloat16m1_t va = __riscv_vle16_v_f16m1(a, vl);
   14   │   va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl);
   15   │   va = __riscv_vfmsac_vv_f16m1(va, va, va, vl);
   16   │
   17   │   __riscv_vse16_v_f16m1(b, va, vl);
   18   │
   19   │   return 0;
   20   │ }

it will be compiled to:
   53   │ main:
   54   │     addi    sp,sp,-16
   55   │     sd  ra,8(sp)
   56   │     call    initialize
   57   │     lui a6,%hi(b)
   58   │     lui a2,%hi(a)
   59   │     addi    a3,a6,%lo(b)
   60   │     addi    a2,a2,%lo(a)
   61   │     li  a4,4
   62   │ .L8:
   63   │     fsrmi   2
   64   │     vsetvli a5,a4,e16,m1,ta,ma
   65   │     vle16.v v1,0(a2)
   66   │     slli    a1,a5,1
   67   │     subw    a4,a4,a5
   68   │     add a2,a2,a1
   69   │     vfnmadd.vv  v1,v1,v1
   >> The fsrm a0 insn is deleted by late-combine <<
   70   │     vfmsub.vv   v1,v1,v1
   71   │     vse16.v v1,0(a3)
   72   │     add a3,a3,a1
   73   │     bgt a4,zero,.L8
   74   │     lh  a4,%lo(b)(a6)
   75   │     li  a5,-20480
   76   │     addi    a5,a5,-1382
   77   │     bne a4,a5,.L14
   78   │     ld  ra,8(sp)
   79   │     li  a0,0
   80   │     addi    sp,sp,16
   81   │     jr  ra

This patch would like to add the FRM register to the global_regs as it
is a cooperatively-managed global register.  And then the fsrm insn will
not be eliminated by late-combine.  The related spec17 cam4 failure may
also caused by this issue too.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

        PR target/118103

gcc/ChangeLog:

        * config/riscv/riscv.cc (riscv_conditional_register_usage): Add
        the FRM as the global_regs.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/base/pr118103-1.c: New test.
        * gcc.target/riscv/rvv/base/pr118103-run-1.c: New test.
A bit surprised to hear that we've got vector code that explicitly sets the rounding mode. We should keep that in mind as I wouldn't be at all surprised to find designs that flush pipes on the rounding mode change (much like we've seen with fxrm). Point being that may be so expensive that other code generation approaches might produce much better performance.

ACK for the change to add FRM to the global registers. Someone had a concern about strict aliasing for the test. So let's get that sorted out before committing.

jeff

Reply via email to