On 7/25/23 20:31, Jeff Law via Gcc-patches wrote:


On 7/25/23 05:24, Jivan Hakobyan wrote:
Hi.

I re-run the benchmarks and hopefully got the same profit.
I also compared the leela's code and figured out the reason.

Actually, my and Manolis's patches do the same thing. The difference is only execution order.
But shouldn't your patch also allow for for at the last the potential to pull the fp+offset computation out of a loop?  I'm pretty sure Manolis's patch can't do that.

Because of f-m-o held after the register allocation it cannot eliminate redundant move 'sp' to another register.
Actually that's supposed to be handled by a different patch that should already be upstream.  Specifically;

commit 6a2e8dcbbd4bab374b27abea375bf7a921047800
Author: Manolis Tsamis <manolis.tsa...@vrull.eu>
Date:   Thu May 25 13:44:41 2023 +0200

    cprop_hardreg: Enable propagation of the stack pointer if possible
        Propagation of the stack pointer in cprop_hardreg is currenty
    forbidden in all cases, due to maybe_mode_change returning NULL.
    Relax this restriction and allow propagation when no mode change is
    requested.
        gcc/ChangeLog:
                * regcprop.cc (maybe_mode_change): Enable stack pointer
            propagation.
I think there were a couple-follow-ups.  But that's the key change that should allow propagation of copies from the stack pointer and thus eliminate the mov gpr,sp instructions.  If that's not happening, then it's worth investigating why.


Besides that, I have checked the build failure on x264_r. It is already fixed on the third version.
Yea, this was a problem with re-recognition.  I think it was fixed by:

commit ecfa870ff29d979bd2c3d411643b551f2b6915b0
Author: Vineet Gupta <vine...@rivosinc.com>
Date:   Thu Jul 20 11:15:37 2023 -0700

    RISC-V: optim const DF +0.0 store to mem [PR/110748]
        Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")
        DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.
[ ... ]


So I think the big question WRT your patch is does it still help the case where we weren't pulling the fp+offset computation out of a loop.

I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to avoid the Thunderbird mangling the test formatting)
benchmark       workload #       upstream        upstream +              
upstream +
                                 g54e54f77c1        f-m-o               
fold-fp-off

500.perlbench_r 0               1217932817476   1217884553366   0.004%  
1217928953834   0.000%
                1               743757241201    743655528133    0.014%  
743695820426    0.008%
                2               703455646090    703423559298    0.005%  
703455296251    0.000%
502.gcc_r       0               195004369104    194973478945    0.016%  
194984188400    0.010%
                1               232719938778    232688491113    0.014%  
232692379085    0.012%
                2               223443280459    223413616368    0.013%  
223424151848    0.009%
                3               186233704624    186206516421    0.015%  
186231137616    0.001%
                4               287406394232    287378870279    0.010%  
287403707466    0.001%
503.bwaves_r    0               316194043679    316194043679    0.000%  
316194043662    0.000%
                1               499293490380    499293490380    0.000%  
499293490363    0.000%
                2               389365401615    389365401615    0.000%  
389365401598    0.000%
                3               473514310679    473514310679    0.000%  
473514310662    0.000%
505.mcf_r       0               689258694902    689254740344    0.001%  
689258694887    0.000%
507.cactuBSSN_r 0               3966612364613   3966498234698   0.003%  
3966612365068   0.000%
508.namd_r      0               1903766272166   1903766271701   0.000%  
1903765987301   0.000%
510.parest_r    0               3512678127316   3512676752062   0.000%  
3512677505662   0.008%
511.povray_r    0               3036725558618   3036722265149   0.000%  
3036725556997   0.000%
519.lbm_r       0               1134454304533   1134454304533   0.000%  
1134454304518   0.000%
520.omnetpp_r   0               1001937885126   1001937884542   0.000%  
1001937883931   0.000%
521.wrf_r       0               3959642601629   3959541912013   0.003%  
3959642615086   0.000%
523.xalancbmk_r 0               1065004269065   1064981413043   0.002%  
1065004132070   0.000%
525.x264_r      0               496492857533    496459367582    0.007%  
496477988435    0.003%
                1               1891248078083   1891222197535   0.001%  
1890990911614   0.014%
                2               1815609267498   1815561397105   0.003%  
1815341248007   0.015%
526.blender_r   0               1672203767444   1671549923427   0.039%  
1672224626743  -0.001%
527.cam4_r      0               2326424925038   2320567166886   0.252%  
2326333566227   0.004% <-
531.deepsjeng_r 0               1668993359340   1662816376544   0.370%  
1668993353038   0.000% <-
538.imagick_r   0               3260965672876   3260965672712   0.000%  
3260965672777   0.000%
541.leela_r     0               2034139863891   2034101807341   0.002%  
2026647843672   0.368%    <--
544.nab_r       0               1566465507272   1565420628706   0.067%  
1566465379674   0.000%
548.exchange2_r 0               2228112071994   2228109962469   0.000%  
2228114278251   0.000%
549.fotonik3d_r 0               2255238867247   2255238867246   0.000%  
2255238865924   0.000%
554.roms_r      0               2653150555486   2651884870455   0.048%  
2653150554877   0.000%
557.xz_r        0               367892301169    367892301167    0.000%  
367892301154    0.000%
                1               979549393200    979549393198    0.000%  
979549393185    0.000%
                2               525066235331    525066235329    0.000%  
525066235316    0.000%
997.specrand_fr 0               453112389       453112389       0.000%  
453112374       0.000%
999.specrand_ir 0               453112389       453112389       0.000%  
453112374       0.000%

Reply via email to