On 6/5/25 20:09, 钟居哲 wrote:
> Hi. Vineet. The series of patches LGTM from myside.

Thx.

> But I wonder whether you would like to optimize VXRM which is using
> mode-switching too.

Was not planning to :-)

> I saw in spec 2017 spec 624 x264.
> csrwi vxrm is calling multiples times.

Is it now ? I think in gcc-15 Jeff added a trick to make it hoist out of loop,
all the way to function top - I think it was pixel_avg in x264.

     2024-10-30 a65e1487cda9 [RISC-V] Aggressively hoist VXRM assignments  

I haven't checked recently though !

Cheers,
-Vineet


> --------------------------------------------------------------------------------
> juzhe.zh...@rivai.ai
>
>      
>     *From:* Vineet Gupta <mailto:vine...@rivosinc.com>
>     *Date:* 2025-06-06 08:04
>     *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
>     *CC:* gnu-toolchain <mailto:gnu-toolch...@rivosinc.com>; Jeff Law
>     <mailto:jeffreya...@gmail.com>; Robin Dapp <mailto:rdapp....@gmail.com>;
>     Juzhe Zhong <mailto:juzhe.zh...@rivai.ai>; Pan Li
>     <mailto:pan2...@intel.com>; Kito Cheng <mailto:kito.ch...@sifive.com>;
>     Vineet Gupta <mailto:vine...@rivosinc.com>
>     *Subject:* [PATCH v2 0/5] RISC-V: frm state-machine improvements
>     Changes since v1
>       - Dropped removal of TARGET_MODE_AFTER
>       - NFC changes to last 2 patches, with reattribution to PRs they address
>     seperately.
>      
>     Hi,
>      
>     This came out of Rivos perf team reporting (shoutout to Siavash) that
>     some of the SPEC2017 workloads had unnecessary FRM wiggles, when
>     none were needed. The writes in particular could be expensive.
>      
>     I started with reduced test for PR/119164 from 
> blender:node_testure_util.c.
>      
>     However in trying to understand (and a botched rewrite of whole thing)
>     it turned out that lot of code was just unnecessary leading to more
>     complexity than warranted. As a result there are more deletions here and
>     the actual improvements come from just a few lines of actual changes.
>      
>     I've verified each patch incrementally with
>     - Testsuite run (unchanged, 1 unexpected pass
>     gcc.target/riscv/rvv/autovec/pr119114.c)
>     - SPEC build
>     - Static analysis of FRM read/write insns emitted in all of SPEC binaries.
>     - There's BPI date for some of this too, but the delta there is not
>        significant as this could really be uarch specific.
>      
>     Here's the result for static analysis.
>      
>                 1. revert-confluence  2. remove-edge-insert 
>     4-fewer-frm-restore  5-call-backtrack
>                   -------------------  -------------------- 
>     -------------------  ---------------
>                     frrm fsrmi fsrm       frrm fsrmi fsrm       frrm fsrmi
>     fsrm     frrm fsrmi fsrm
>         perlbench_r   42    0    4          42    0    4          17    0   
>     1        17    0    1
>            cpugcc_r  167    0   17         167    0   17          11    0   
>     0        11    0    0
>            bwaves_r   16    0    1          16    0    1          16    0   
>     1        16    0    1
>               mcf_r   11    0    0          11    0    0          11    0   
>     0        11    0    0
>        cactusBSSN_r   79    0   27          76    0   27          19    0   
>     1        19    0    1
>              namd_r  119    0   63         119    0   63          14    0   
>     1        14    0    1
>            parest_r  218    0  114         168    0  114          24    0   
>     1        24    0    1
>            povray_r  123    1   17         123    1   17          26    1   
>     6        26    1    6
>               lbm_r    6    0    0           6    0    0           6    0   
>     0         6    0    0
>           omnetpp_r   17    0    1          17    0    1          17    0   
>     1        17    0    1
>               wrf_r 2287   13 1956        2287   13 1956        1268   13
>     1603       613   13   82
>          cpuxalan_r   17    0    1          17    0    1          17    0   
>     1        17    0    1
>            ldecod_r   11    0    0          11    0    0          11    0   
>     0        11    0    0
>              x264_r   14    0    1          14    0    1          11    0   
>     0        11    0    0
>           blender_r  724   12  182         724   12  182          61   12  
>     42        39   12   16
>              cam4_r  324   13  169         324   13  169          45   13  
>     20        40   13   17
>         deepsjeng_r   11    0    0          11    0    0          11    0   
>     0        11    0    0
>           imagick_r  265   16   34         265   16   34         132   16  
>     25        33   16   18
>             leela_r   12    0    0          12    0    0          12    0   
>     0        12    0    0
>               nab_r   13    0    1          13    0    1          13    0   
>     1        13    0    1
>         exchange2_r   16    0    1          16    0    1          16    0   
>     1        16    0    1
>         fotonik3d_r   20    0   11          20    0   11          19    0   
>     1        19    0    1
>              roms_r   33    0   23          33    0   23          21    0   
>     1        21    0    1
>                xz_r    6    0    0           6    0    0           6    0   
>     0         6    0    0
>                   --------------------  ------------------- 
>     -------------------  ----------------
>                     4551   55 2623        4498   55 2623        1804   55
>     1707      1023   55  150
>                   --------------------  ------------------- 
>     -------------------  ----------------
>                               7729                  7176                 
>     3566                1228
>                   --------------------  ------------------- 
>     -------------------  ----------------
>      
>     Note that wrf still has ridiculously high number of FRM ops which will be
>     tackled as a follow-up.
>      
>     Please review.
>      
>     Thx,
>     -Vineet
>      
>     Vineet Gupta (5):
>       emit-rtl: document next_nonnote_nondebug_insn_bb () can breach into
>         next BB
>       RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE
>       RISC-V: frm/mode-switch: remove dubious frm edge insertion before
>         call_insn
>       RISC-V: frm/mode-switch: Reduce FRM restores on DYN transition
>         [PR119164]
>       RISC-V: frm/mode-switch: robustify call_insn backtracking [PR120203]
>      
>     gcc/config/riscv/riscv.cc                     | 121 +++---------------
>     gcc/emit-rtl.cc                               |   6 +-
>     .../rvv/base/float-point-dynamic-frm-74.c     |   2 +-
>     .../gcc.target/riscv/rvv/base/pr119164.c      |  22 ++++
>     4 files changed, 43 insertions(+), 108 deletions(-)
>     create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr119164.c
>      
>     -- 
>     2.43.0
>      
>      
>

Reply via email to