On 7/25/23 20:31, Jeff Law via Gcc-patches wrote:
On 7/25/23 05:24, Jivan Hakobyan wrote:
Hi.
I re-run the benchmarks and hopefully got the same profit.
I also compared the leela's code and figured out the reason.
Actually, my and Manolis's patches do the same thing. The difference
is only execution order.
But shouldn't your patch also allow for for at the last the potential
to pull the fp+offset computation out of a loop? I'm pretty sure
Manolis's patch can't do that.
Because of f-m-o held after the register allocation it cannot
eliminate redundant move 'sp' to another register.
Actually that's supposed to be handled by a different patch that
should already be upstream. Specifically;
commit 6a2e8dcbbd4bab374b27abea375bf7a921047800
Author: Manolis Tsamis <manolis.tsa...@vrull.eu>
Date: Thu May 25 13:44:41 2023 +0200
cprop_hardreg: Enable propagation of the stack pointer if possible
Propagation of the stack pointer in cprop_hardreg is currenty
forbidden in all cases, due to maybe_mode_change returning NULL.
Relax this restriction and allow propagation when no mode change is
requested.
gcc/ChangeLog:
* regcprop.cc (maybe_mode_change): Enable stack pointer
propagation.
I think there were a couple-follow-ups. But that's the key change
that should allow propagation of copies from the stack pointer and
thus eliminate the mov gpr,sp instructions. If that's not happening,
then it's worth investigating why.
Besides that, I have checked the build failure on x264_r. It is
already fixed on the third version.
Yea, this was a problem with re-recognition. I think it was fixed by:
commit ecfa870ff29d979bd2c3d411643b551f2b6915b0
Author: Vineet Gupta <vine...@rivosinc.com>
Date: Thu Jul 20 11:15:37 2023 -0700
RISC-V: optim const DF +0.0 store to mem [PR/110748]
Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")
DF +0.0 is bitwise all zeros so int x0 store to mem can be
used to optimize it.
[ ... ]
So I think the big question WRT your patch is does it still help the
case where we weren't pulling the fp+offset computation out of a loop.
I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
avoid the Thunderbird mangling the test formatting)
benchmark workload # upstream upstream +
upstream +
g54e54f77c1 f-m-o
fold-fp-off
500.perlbench_r 0 1217932817476 1217884553366 0.004%
1217928953834 0.000%
1 743757241201 743655528133 0.014%
743695820426 0.008%
2 703455646090 703423559298 0.005%
703455296251 0.000%
502.gcc_r 0 195004369104 194973478945 0.016%
194984188400 0.010%
1 232719938778 232688491113 0.014%
232692379085 0.012%
2 223443280459 223413616368 0.013%
223424151848 0.009%
3 186233704624 186206516421 0.015%
186231137616 0.001%
4 287406394232 287378870279 0.010%
287403707466 0.001%
503.bwaves_r 0 316194043679 316194043679 0.000%
316194043662 0.000%
1 499293490380 499293490380 0.000%
499293490363 0.000%
2 389365401615 389365401615 0.000%
389365401598 0.000%
3 473514310679 473514310679 0.000%
473514310662 0.000%
505.mcf_r 0 689258694902 689254740344 0.001%
689258694887 0.000%
507.cactuBSSN_r 0 3966612364613 3966498234698 0.003%
3966612365068 0.000%
508.namd_r 0 1903766272166 1903766271701 0.000%
1903765987301 0.000%
510.parest_r 0 3512678127316 3512676752062 0.000%
3512677505662 0.008%
511.povray_r 0 3036725558618 3036722265149 0.000%
3036725556997 0.000%
519.lbm_r 0 1134454304533 1134454304533 0.000%
1134454304518 0.000%
520.omnetpp_r 0 1001937885126 1001937884542 0.000%
1001937883931 0.000%
521.wrf_r 0 3959642601629 3959541912013 0.003%
3959642615086 0.000%
523.xalancbmk_r 0 1065004269065 1064981413043 0.002%
1065004132070 0.000%
525.x264_r 0 496492857533 496459367582 0.007%
496477988435 0.003%
1 1891248078083 1891222197535 0.001%
1890990911614 0.014%
2 1815609267498 1815561397105 0.003%
1815341248007 0.015%
526.blender_r 0 1672203767444 1671549923427 0.039%
1672224626743 -0.001%
527.cam4_r 0 2326424925038 2320567166886 0.252%
2326333566227 0.004% <-
531.deepsjeng_r 0 1668993359340 1662816376544 0.370%
1668993353038 0.000% <-
538.imagick_r 0 3260965672876 3260965672712 0.000%
3260965672777 0.000%
541.leela_r 0 2034139863891 2034101807341 0.002%
2026647843672 0.368% <--
544.nab_r 0 1566465507272 1565420628706 0.067%
1566465379674 0.000%
548.exchange2_r 0 2228112071994 2228109962469 0.000%
2228114278251 0.000%
549.fotonik3d_r 0 2255238867247 2255238867246 0.000%
2255238865924 0.000%
554.roms_r 0 2653150555486 2651884870455 0.048%
2653150554877 0.000%
557.xz_r 0 367892301169 367892301167 0.000%
367892301154 0.000%
1 979549393200 979549393198 0.000%
979549393185 0.000%
2 525066235331 525066235329 0.000%
525066235316 0.000%
997.specrand_fr 0 453112389 453112389 0.000%
453112374 0.000%
999.specrand_ir 0 453112389 453112389 0.000%
453112374 0.000%