On 6/5/25 20:09, 钟居哲 wrote:
> Hi. Vineet. The series of patches LGTM from myside.
Thx.
> But I wonder whether you would like to optimize VXRM which is using
> mode-switching too.
Was not planning to :-)
> I saw in spec 2017 spec 624 x264.
> csrwi vxrm is calling multiples times.
Is it now ? I think in gcc-15 Jeff added a trick to make it hoist out of loop,
all the way to function top - I think it was pixel_avg in x264.
2024-10-30 a65e1487cda9 [RISC-V] Aggressively hoist VXRM assignments
I haven't checked recently though !
Cheers,
-Vineet
> --------------------------------------------------------------------------------
> juzhe.zh...@rivai.ai
>
>
> *From:* Vineet Gupta <mailto:vine...@rivosinc.com>
> *Date:* 2025-06-06 08:04
> *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
> *CC:* gnu-toolchain <mailto:gnu-toolch...@rivosinc.com>; Jeff Law
> <mailto:jeffreya...@gmail.com>; Robin Dapp <mailto:rdapp....@gmail.com>;
> Juzhe Zhong <mailto:juzhe.zh...@rivai.ai>; Pan Li
> <mailto:pan2...@intel.com>; Kito Cheng <mailto:kito.ch...@sifive.com>;
> Vineet Gupta <mailto:vine...@rivosinc.com>
> *Subject:* [PATCH v2 0/5] RISC-V: frm state-machine improvements
> Changes since v1
> - Dropped removal of TARGET_MODE_AFTER
> - NFC changes to last 2 patches, with reattribution to PRs they address
> seperately.
>
> Hi,
>
> This came out of Rivos perf team reporting (shoutout to Siavash) that
> some of the SPEC2017 workloads had unnecessary FRM wiggles, when
> none were needed. The writes in particular could be expensive.
>
> I started with reduced test for PR/119164 from
> blender:node_testure_util.c.
>
> However in trying to understand (and a botched rewrite of whole thing)
> it turned out that lot of code was just unnecessary leading to more
> complexity than warranted. As a result there are more deletions here and
> the actual improvements come from just a few lines of actual changes.
>
> I've verified each patch incrementally with
> - Testsuite run (unchanged, 1 unexpected pass
> gcc.target/riscv/rvv/autovec/pr119114.c)
> - SPEC build
> - Static analysis of FRM read/write insns emitted in all of SPEC binaries.
> - There's BPI date for some of this too, but the delta there is not
> significant as this could really be uarch specific.
>
> Here's the result for static analysis.
>
> 1. revert-confluence 2. remove-edge-insert
> 4-fewer-frm-restore 5-call-backtrack
> ------------------- --------------------
> ------------------- ---------------
> frrm fsrmi fsrm frrm fsrmi fsrm frrm fsrmi
> fsrm frrm fsrmi fsrm
> perlbench_r 42 0 4 42 0 4 17 0
> 1 17 0 1
> cpugcc_r 167 0 17 167 0 17 11 0
> 0 11 0 0
> bwaves_r 16 0 1 16 0 1 16 0
> 1 16 0 1
> mcf_r 11 0 0 11 0 0 11 0
> 0 11 0 0
> cactusBSSN_r 79 0 27 76 0 27 19 0
> 1 19 0 1
> namd_r 119 0 63 119 0 63 14 0
> 1 14 0 1
> parest_r 218 0 114 168 0 114 24 0
> 1 24 0 1
> povray_r 123 1 17 123 1 17 26 1
> 6 26 1 6
> lbm_r 6 0 0 6 0 0 6 0
> 0 6 0 0
> omnetpp_r 17 0 1 17 0 1 17 0
> 1 17 0 1
> wrf_r 2287 13 1956 2287 13 1956 1268 13
> 1603 613 13 82
> cpuxalan_r 17 0 1 17 0 1 17 0
> 1 17 0 1
> ldecod_r 11 0 0 11 0 0 11 0
> 0 11 0 0
> x264_r 14 0 1 14 0 1 11 0
> 0 11 0 0
> blender_r 724 12 182 724 12 182 61 12
> 42 39 12 16
> cam4_r 324 13 169 324 13 169 45 13
> 20 40 13 17
> deepsjeng_r 11 0 0 11 0 0 11 0
> 0 11 0 0
> imagick_r 265 16 34 265 16 34 132 16
> 25 33 16 18
> leela_r 12 0 0 12 0 0 12 0
> 0 12 0 0
> nab_r 13 0 1 13 0 1 13 0
> 1 13 0 1
> exchange2_r 16 0 1 16 0 1 16 0
> 1 16 0 1
> fotonik3d_r 20 0 11 20 0 11 19 0
> 1 19 0 1
> roms_r 33 0 23 33 0 23 21 0
> 1 21 0 1
> xz_r 6 0 0 6 0 0 6 0
> 0 6 0 0
> -------------------- -------------------
> ------------------- ----------------
> 4551 55 2623 4498 55 2623 1804 55
> 1707 1023 55 150
> -------------------- -------------------
> ------------------- ----------------
> 7729 7176
> 3566 1228
> -------------------- -------------------
> ------------------- ----------------
>
> Note that wrf still has ridiculously high number of FRM ops which will be
> tackled as a follow-up.
>
> Please review.
>
> Thx,
> -Vineet
>
> Vineet Gupta (5):
> emit-rtl: document next_nonnote_nondebug_insn_bb () can breach into
> next BB
> RISC-V: frm/mode-switch: remove TARGET_MODE_CONFLUENCE
> RISC-V: frm/mode-switch: remove dubious frm edge insertion before
> call_insn
> RISC-V: frm/mode-switch: Reduce FRM restores on DYN transition
> [PR119164]
> RISC-V: frm/mode-switch: robustify call_insn backtracking [PR120203]
>
> gcc/config/riscv/riscv.cc | 121 +++---------------
> gcc/emit-rtl.cc | 6 +-
> .../rvv/base/float-point-dynamic-frm-74.c | 2 +-
> .../gcc.target/riscv/rvv/base/pr119164.c | 22 ++++
> 4 files changed, 43 insertions(+), 108 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr119164.c
>
> --
> 2.43.0
>
>
>