sorry the last one wasnt correct: register merge vs yours
Max Increase: SGPRS: 80 -> 96 (20.00 %) (in shaders/deusex_mankind/7c39f71090a9db19ac2e1542ea12804ae6c6495b_4864.shader_test) VGPRS: 64 -> 84 (31.25 %) (in shaders/dirtrally/0859b69789591d7046e211400b1edd9a7cfca734_742.shader_test) Spilled SGPRs: 0 -> 16 (0.00 %) (in shaders/deusex_mankind/d64e2084204e29749639e8fbd9a1e507c7e5e1dd_6840.shader_test) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 24 -> 32 (33.33 %) (in shaders/deusex_mankind/28cac87049d8c833e72296a5a02ea6118f1144e5_5876.shader_test) Scratch size: 28 -> 36 (28.57 %) dwords per thread (in shaders/deusex_mankind/28cac87049d8c833e72296a5a02ea6118f1144e5_5876.shader_test) Code Size: 6988 -> 8036 (15.00 %) bytes (in shaders/cat/1847.shader_test) LDS: 0 -> 0 (0.00 %) blocks Max Waves: 5 -> 7 (40.00 %) (in shaders/ruiner/0967c5fce7fc456496b1cfa25fbb1d1c4dcf9bed_2958.shader_test) Wait states: 0 -> 0 (0.00 %) Max Decrease: SGPRS: 96 -> 72 (-25.00 %) (in shaders/deusex_mankind/c1f098c7b14b1ba291cfa9bba4e41ba91acaba30_3630.shader_test) VGPRS: 80 -> 68 (-15.00 %) (in shaders/dirtrally/710d3319bc986ea003f1a84ec6d3c01b2a8b9ded_2482.shader_test) Spilled SGPRs: 19 -> 0 (-100.00 %) (in shaders/deusex_mankind/0749c9ae23417f918c7286fe502ff5de4cb8e1a0_3276.shader_test) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 9080 -> 8712 (-4.05 %) bytes (in shaders/dirtrally/2aaaae1a8e04b49dee3e39b6f30a7c707f395abf_2480.shader_test) LDS: 0 -> 0 (0.00 %) blocks Max Waves: 8 -> 5 (-37.50 %) (in shaders/deusex_mankind/8dabec49e5b6c3b1cbcbaee194eff69f164d72f4_3968.shader_test) Wait states: 0 -> 0 (0.00 %) PERCENTAGE DELTAS Shaders SGPRs VGPRs SpillSGPR SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits 0ad 6 . . . . . . . . . aer 590 . . . . . . . . . alien_isolation 1414 . . . . . . . . . anholt 10 . . . . . . . . . bioshock_infinite 2581 . 0.04 % . . . . 0.01 % . . blackmesa 584 . . . . . . . . . cat 573 -0.06 % -0.09 % . . . . 0.08 % 0.05 % . csgo 1392 . . . . . . . . . deadisland_definitive 1776 . . . . . . . . . deadisland_original 11602 . . . . . . . . . deadisland_riptide_.. 293 . . . . . . . . . deusex_mankind 5051 -0.06 % . -4.64 % . 33.33 % 28.57 % . -0.01 % . dirtrally 787 0.04 % 0.49 % -0.46 % . . . 0.03 % -0.23 % . dolphin 22 . . . . . . . . . dyinglight 4012 . . . . . . . . . eurotruck2 216 . . . . . . . . . f1_2015 746 . -0.03 % 1.77 % . . . 0.05 % . . glamor 16 . . . . . . . . . hl2ep1 294 . . . . . . . . . hl2ep2 154 . . . . . . . . . hl2lostcoast 66 . . . . . . . . . hlsl3 582 . . . . . . . . . humus-celshading 4 . . . . . . . . . humus-domino 6 . . . . . . . . . humus-dynamicbranching 24 . . . . . . . . . humus-hdr 10 . . . . . . . . . humus-portals 2 . . . . . . . . . humus-volumetricfog.. 6 . . . . . . . . . kerbal 1016 . . . . . . . . . larago 664 . . . . . . . . . madmax 354 0.04 % . . . . . . . . metro2033redux 4410 . 0.03 % . . . . . -0.02 % . nexuiz 80 . . . . . . . . . ruiner 685 -0.10 % -0.06 % . . . . 0.04 % 0.04 % . sauerbraten 7 . . . . . . . . . serioussam2017 736 0.03 % -0.07 % -0.47 % . . . . 0.06 % . soma 436 . . . . . . . . . specops 1814 . . . . . . . . . stellaris 434 . . . . . . . . . supertuxkart 4 . . . . . . . . . talos 762 . . . . . . . . . tesseract 430 . . . . . . . . . tombraider 1012 -0.03 % . . . . . 0.02 % . . total_war_shogun_2 176 . . . . . . . . . total_war_warhammer 218 . . . . . . 0.71 % . . ubershaders 54 . . . . . . 0.57 % . . ug_gettysburg 149 . . . . . . . . . unigine_heaven 226 . . . . . . . . . unigine_superposition 733 . . . . . . . . . unigine_valley 288 . . . . . . . . . unity 72 . . . . . . . . . w40kdawn2 421 . . . . . . -0.03 % . . w40kdawn3 164 . . . . . . . . . warsow 176 . . . . . . . . . warzone2100 4 . . . . . . . . . witcher2 928 -0.02 % 0.02 % . . . . . . . x3_albion 641 . . . . . . . . . xblades 208 . . . . . . 0.02 % . . xcom 1020 . . . . . . . . . xcom2 1439 . . . . . . . . . yofrankie 82 . . . . . . . . . ---------------------------------------------------------------------------------------------------------------------- All affected 516 -0.52 % 0.41 % -0.58 % . 7.41 % 6.67 % 0.26 % -0.89 % . ---------------------------------------------------------------------------------------------------------------------- Total 52662 . 0.01 % -0.19 % . 1.34 % 1.09 % 0.01 % . . Am 29.04.2018 um 11:34 schrieb Gert Wollny: > Am Sonntag, den 29.04.2018, 10:43 +0200 schrieb Benedikt Schemmer: >> Hi Gert, >> >> couldn't resist at least to try what would happen if I enable >> register merge for radeonsi: >> >> PERCENTAGE DELTAS Shaders SGPRs VGPRs SpillSGPR >> SpillVGPR PrivVGPR Scratch CodeSize MaxWaves Waits >> piglit 80732 -0.16 % -0.02 >> % . . 0.87 % 0.86 % 0.04 % . . >> ------------------------------------------------------------------ >> ---------------------------------------------------- >> All affected 513 -17.58 % -2.30 >> % . . 4.12 % 5.87 % 1.73 % 0.10 % . >> ------------------------------------------------------------------ >> ---------------------------------------------------- >> Total 80732 -0.16 % -0.02 >> % . . 0.87 % 0.86 % 0.04 % . . >> >> I had already removed the defines around the debug code so thats also >> happily outputting data. >> >> fails with two piglit shaders: > Which are the names of these test? I'd like to check this on r600, > because here I didn't see any regressions last time I checked. > > >> Real world is a little different: > > I guess these tests refer to enabled register_merge - without and with > this patch set, no? > > Out of curiosity, did you also look at how enabling register_merge > (before this series) impacts the result as compared to the normal > operation of radeonsi? > > >> If theres an easy way to figure out when your code makes it worse and >> when its an improvement this would be really nice. > > My insentive for this series was, that on r600 the arrays are allocated > before the final optimization pass on the byte code that requires that > the number of registers is <= 124. When I started this no spilling was > implemented, and shaders with too many arrays and registers would > simply fail. Now spilling is impelmented, but AFAIK reducing the > numbers of registers in the final optimization pass does not result in > changed spilling, so bringing down the number of registers before tgsi- > to-bytecode is still of interest. > > For radeonsi my guess would be that the llvm optimizer works better > when the registers are not yet merged, and that would be the reason why > register_merge is disabled. > > In any case, Timothy wrote in this thread [1] (last message) that he > had similar patches for NIR. > > Best, > Gert > > [1] https://patchwork.freedesktop.org/patch/189842/ > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev