On 5/13/25 10:07, Vineet Gupta wrote: > > > On 5/10/25 07:20, Jeff Law wrote: >> On 5/9/25 2:27 PM, Vineet Gupta wrote: >>> This is effectively reverting e5d1f538bb7d >>> "(RISC-V: Allow different dynamic floating point mode to be merged)" >>> while retaining the testcase. >>> >>> The change itself is valid, however it obfuscates the deficiencies in >>> current frm mode switching code. >>> >>> Also for a SPEC2017 -Ofast -march=rv64gcv build, it ends up generating >>> net more FRM restores (writes) vs. the rest of this changeset. >>> >>> gcc/ChangeLog: >>> >>> * config/riscv/riscv.cc (riscv_dynamic_frm_mode_p): Remove. >>> (riscv_mode_confluence): Ditto. >>> (TARGET_MODE_CONFLUENCE): Ditto. >> Unsure on this one. >> >> >> >>> - /* FRM_DYN, FRM_DYN_CALL and FRM_DYN_EXIT are all compatible. >>> - Although we already try to set the mode needed to FRM_DYN after a >>> - function call, there are still some corner cases where both FRM_DYN >>> - and FRM_DYN_CALL may appear on incoming edges. */ >> Do we have an understanding of these corner cases? That's my biggest >> worry with simply removing this code.
Yes we do. 1. First argument is that W/ or W/o the patch, we have same results in the end. (a) With Confluence patch + my changes on top we have the following a1-confluence a2-rem-edge-insert a3-remove-mode-after a4-reduce-frm-restore a5-call-backtrack -------------------------------------------- frrm fsrmi fsrm perlbench_r 17 0 1 17 0 1 cpugcc_r 11 0 0 11 0 0 bwaves_r 16 0 1 16 0 1 mcf_r 11 0 0 11 0 0 cactusBSSN_r 19 0 1 19 0 1 namd_r 14 0 1 14 0 1 parest_r 24 0 1 24 0 1 povray_r 26 1 6 26 1 6 lbm_r 6 0 0 6 0 0 omnetpp_r 17 0 1 17 0 1 wrf_r 411 13 164 613 13 82 cpuxalan_r 17 0 1 17 0 1 ldecod_r 11 0 0 11 0 0 x264_r 11 0 0 11 0 0 blender_r 37 12 16 39 12 16 cam4_r 37 13 17 40 13 17 deepsjeng_r 11 0 0 11 0 0 imagick_r 33 16 18 33 16 18 leela_r 12 0 0 12 0 0 nab_r 13 0 1 13 0 1 exchange2_r 16 0 1 16 0 1 fotonik3d_r 19 0 1 19 0 1 roms_r 21 0 1 21 0 1 xz_r 6 0 0 6 0 0 ----------------------------------- 816 55 232 1023 55 150 (b) Revert the confluence patch + my changes, we still see the final result > a1-confluence b1-revert-confluence > b2-rem-edge-insert b4-reduce-frm-restores b5-call-backtrack > > b3-remove-mode-after b6-readd-confluence > > --------------------------------------------------------------------------------------------------------------- > perlbench_r 17 0 1 42 0 4 42 0 4 > 17 0 1 17 0 1 > cpugcc_r 11 0 0 167 0 17 167 0 17 > 11 0 0 11 0 0 > bwaves_r 16 0 1 16 0 1 16 0 1 > 16 0 1 16 0 1 > mcf_r 11 0 0 11 0 0 11 0 0 > 11 0 0 11 0 0 > cactusBSSN_r 19 0 1 79 0 27 76 0 27 > 19 0 1 19 0 1 > namd_r 14 0 1 119 0 63 119 0 63 > 14 0 1 14 0 1 > parest_r 24 0 1 218 0 114 168 0 114 > 24 0 1 24 0 1 > povray_r 26 1 6 123 1 17 123 1 17 > 26 1 6 26 1 6 > lbm_r 6 0 0 6 0 0 6 0 0 > 6 0 0 6 0 0 > omnetpp_r 17 0 1 17 0 1 17 0 1 > 17 0 1 17 0 1 > wrf_r 411 13 164 2287 13 1956 2287 13 1956 > 1268 13 1603 613 13 82 > cpuxalan_r 17 0 1 17 0 1 17 0 1 > 17 0 1 17 0 1 > ldecod_r 11 0 0 11 0 0 11 0 0 > 11 0 0 11 0 0 > x264_r 11 0 0 14 0 1 14 0 1 > 11 0 0 11 0 0 > blender_r 37 12 16 724 12 182 724 12 182 > 61 12 42 39 12 16 > cam4_r 37 13 17 324 13 169 324 13 169 > 45 13 20 40 13 17 > deepsjeng_r 11 0 0 11 0 0 11 0 0 > 11 0 0 11 0 0 > imagick_r 33 16 18 265 16 34 265 16 34 > 132 16 25 33 16 18 > leela_r 12 0 0 12 0 0 12 0 0 > 12 0 0 12 0 0 > nab_r 13 0 1 13 0 1 13 0 1 > 13 0 1 13 0 1 > exchange2_r 16 0 1 16 0 1 16 0 1 > 16 0 1 16 0 1 > fotonik3d_r 19 0 1 20 0 11 20 0 11 > 19 0 1 19 0 1 > roms_r 21 0 1 33 0 23 33 0 23 > 21 0 1 21 0 1 > xz_r 6 0 0 6 0 0 6 0 0 > 6 0 0 6 0 0 > > -------------------------------------------------------------------------------------------------------- > 816 55 232 4551 55 2623 4498 55 2623 > 1804 55 1707 1023 55 150 So essentially that change is no longer yielding any benefits. 2. Now back to the first and last columns (same for both tables) - (a). We see fewer FRM writes vs. confluence change, so that should be undisputed. (b). There are more FRM reads vs. confluence and that needs explaining The bulk of delta is from wrf and moreover from a single function solve_em which I dug into. frrm fsrmi fsrm b5: my changes (W/ or W/o confluence) solve_em_ 555 1 50 a1: confluence only solve_em_ 359 1 132 solve_em has ~960 calls to free () $ grep -c "__free" objd-wrf-lto-b5 objd-wrf-lto-a1 objd-wrf-lto-b5:964 # my changes objd-wrf-lto-a1:967 # confluence only If I search for a multi-line pattern with call __free followed by frrm (as is expected in current codegen), and delete that pattern we see following $ grep -c "__free" objd-wrf-lto-b5 objd-wrf-lto-a1 objd-wrf-lto-b5:634 # my changes objd-wrf-lto-a1:827 # confluence only $ egrep -c frrm objd-wrf-lto-b5 objd-wrf-lto-a1 objd-wrf-lto-b5:225 # my changes objd-wrf-lto-a1:219 # confluence only IOW there were more call + frm pairs in new code vs. with old. So this all comes down to PR/120203 where FRM reads are missing some function calls in certain conditions and new code increases the number of reads for correctness. 3. Having said that the fact that this function has 1 static RM fsrmi and hundreds of save/restores is just ridiculous and that is a harder problem to solve and captured in PR/120245 Thx, -Vineet