Hi. Vineet. The series of patches LGTM from myside.

But I wonder whether you would like to optimize VXRM which is using 
mode-switching too.

I saw in spec 2017 spec 624 x264.

csrwi vxrm is calling multiples times.


juzhe.zh...@rivai.ai
 
From: Vineet Gupta
Date: 2025-06-06 08:04
To: gcc-patches
CC: gnu-toolchain; Jeff Law; Robin Dapp; Juzhe Zhong; Pan Li; Kito Cheng; 
Vineet Gupta
Subject: [PATCH v2 0/5] RISC-V: frm state-machine improvements
Changes since v1
  - Dropped removal of TARGET_MODE_AFTER
  - NFC changes to last 2 patches, with reattribution to PRs they address 
seperately.
 
Hi,
 
This came out of Rivos perf team reporting (shoutout to Siavash) that
some of the SPEC2017 workloads had unnecessary FRM wiggles, when
none were needed. The writes in particular could be expensive.
 
I started with reduced test for PR/119164 from blender:node_testure_util.c.
 
However in trying to understand (and a botched rewrite of whole thing)
it turned out that lot of code was just unnecessary leading to more
complexity than warranted. As a result there are more deletions here and
the actual improvements come from just a few lines of actual changes.
 
I've verified each patch incrementally with
- Testsuite run (unchanged, 1 unexpected pass 
gcc.target/riscv/rvv/autovec/pr119114.c)
- SPEC build
- Static analysis of FRM read/write insns emitted in all of SPEC binaries.
- There's BPI date for some of this too, but the delta there is not
   significant as this could really be uarch specific.
 
Here's the result for static analysis.
 
            1. revert-confluence  2. remove-edge-insert  4-fewer-frm-restore  
5-call-backtrack
              -------------------  --------------------  -------------------  
---------------
                frrm fsrmi fsrm       frrm fsrmi fsrm       frrm fsrmi fsrm     
frrm fsrmi fsrm
    perlbench_r   42    0    4          42    0    4          17    0    1      
  17    0    1
       cpugcc_r  167    0   17         167    0   17          11    0    0      
  11    0    0
       bwaves_r   16    0    1          16    0    1          16    0    1      
  16    0    1
          mcf_r   11    0    0          11    0    0          11    0    0      
  11    0    0
   cactusBSSN_r   79    0   27          76    0   27          19    0    1      
  19    0    1
         namd_r  119    0   63         119    0   63          14    0    1      
  14    0    1
       parest_r  218    0  114         168    0  114          24    0    1      
  24    0    1
       povray_r  123    1   17         123    1   17          26    1    6      
  26    1    6
          lbm_r    6    0    0           6    0    0           6    0    0      
   6    0    0
      omnetpp_r   17    0    1          17    0    1          17    0    1      
  17    0    1
          wrf_r 2287   13 1956        2287   13 1956        1268   13 1603      
 613   13   82
     cpuxalan_r   17    0    1          17    0    1          17    0    1      
  17    0    1
       ldecod_r   11    0    0          11    0    0          11    0    0      
  11    0    0
         x264_r   14    0    1          14    0    1          11    0    0      
  11    0    0
      blender_r  724   12  182         724   12  182          61   12   42      
  39   12   16
         cam4_r  324   13  169         324   13  169          45   13   20      
  40   13   17
    deepsjeng_r   11    0    0          11    0    0          11    0    0      
  11    0    0
      imagick_r  265   16   34         265   16   34         132   16   25      
  33   16   18
        leela_r   12    0    0          12    0    0          12    0    0      
  12    0    0
          nab_r   13    0    1          13    0    1          13    0    1      
  13    0    1
    exchange2_r   16    0    1          16    0    1          16    0    1      
  16    0    1
    fotonik3d_r   20    0   11          20    0   11          19    0    1      
  19    0    1
         roms_r   33    0   23          33    0   23          21    0    1      
  21    0    1
           xz_r    6    0    0           6    0    0           6    0    0      
   6    0    0
              --------------------  -------------------  -------------------  
----------------
                4551   55 2623        4498   55 2623        1804   55 1707      
1023   55  150
              --------------------  -------------------  -------------------  
----------------
                          7729                  7176                  3566      
          1228
              --------------------  -------------------  -------------------  
----------------
 
Note that wrf still has ridiculously high number of FRM ops which will be 
tackled as a follow-up.
 
Please review.
 
Thx,
-Vineet
 
Vineet Gupta (5):
  emit-rtl: document next_nonnote_nondebug_insn_bb () can breach into
    next BB
  RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE
  RISC-V: frm/mode-switch: remove dubious frm edge insertion before
    call_insn
  RISC-V: frm/mode-switch: Reduce FRM restores on DYN transition
    [PR119164]
  RISC-V: frm/mode-switch: robustify call_insn backtracking [PR120203]
 
gcc/config/riscv/riscv.cc                     | 121 +++---------------
gcc/emit-rtl.cc                               |   6 +-
.../rvv/base/float-point-dynamic-frm-74.c     |   2 +-
.../gcc.target/riscv/rvv/base/pr119164.c      |  22 ++++
4 files changed, 43 insertions(+), 108 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr119164.c
 
-- 
2.43.0
 
 

Reply via email to