On Thu, 2019-02-28 at 16:20 -0800, Ian Romanick wrote: > On 2/28/19 4:47 AM, Iago Toral wrote: > > On Wed, 2019-02-27 at 17:04 -0800, Ian Romanick wrote: > > > On 2/27/19 4:45 AM, Iago Toral Quiroga wrote: > > > > Now that we propagate constants to the first source of 2src > > > > instructions we > > > > see more opportunities of constant folding in the backend. > > > > > > > > Shader-db results on KBL: > > > > > > > > total instructions in shared programs: 14965607 -> 14855983 (- > > > > 0.73%) > > > > instructions in affected programs: 3988102 -> 3878478 (-2.75%) > > > > helped: 14292 > > > > HURT: 59 > > > > > > > > total cycles in shared programs: 344324295 -> 340656008 (- > > > > 1.07%) > > > > cycles in affected programs: 247527740 -> 243859453 (-1.48%) > > > > helped: 14056 > > > > HURT: 3314 > > > > > > > > total loops in shared programs: 4283 -> 4283 (0.00%) > > > > loops in affected programs: 0 -> 0 > > > > helped: 0 > > > > HURT: 0 > > > > > > > > total spills in shared programs: 27812 -> 24350 (-12.45%) > > > > spills in affected programs: 24921 -> 21459 (-13.89%) > > > > helped: 345 > > > > HURT: 19 > > > > > > > > total fills in shared programs: 24173 -> 22032 (-8.86%) > > > > fills in affected programs: 21124 -> 18983 (-10.14%) > > > > helped: 355 > > > > HURT: 25 > > > > > > Ignore my previous questions about nir_opt_constant_folding after > > > nir_opt_algebraic_late. I had done that because I added a bunch > > > of > > > things to nir_opt_algebraic_late that created my constant folding > > > opportunities. > > > > > > This is the combined changes for this patch and the previous > > > patch. For > > > this patch alone, I got: > > > > > > total instructions in shared programs: 15306213 -> 15221518 (- > > > 0.55%) > > > instructions in affected programs: 2911451 -> 2826756 (-2.91%) > > > helped: 13121 > > > HURT: 44 > > > helped stats (abs) min: 1 max: 51 x̄: 6.66 x̃: 6 > > > helped stats (rel) min: <.01% max: 16.67% x̄: 4.27% x̃: 3.30% > > > HURT stats (abs) min: 3 max: 453 x̄: 61.16 x̃: 5 > > > HURT stats (rel) min: 0.20% max: 151.00% x̄: 31.57% x̃: 19.23% > > > 95% mean confidence interval for instructions value: -6.61 -6.26 > > > 95% mean confidence interval for instructions %-change: -4.23% > > > -4.07% > > > Instructions are helped. > > > > > > total cycles in shared programs: 375419164 -> 372829148 (-0.69%) > > > cycles in affected programs: 146769299 -> 144179283 (-1.76%) > > > helped: 10992 > > > HURT: 1833 > > > helped stats (abs) min: 1 max: 56127 x̄: 250.29 x̃: 18 > > > helped stats (rel) min: <.01% max: 40.52% x̄: 3.11% x̃: 2.58% > > > HURT stats (abs) min: 1 max: 1718 x̄: 87.93 x̃: 42 > > > HURT stats (rel) min: <.01% max: 139.33% x̄: 7.74% x̃: 3.08% > > > 95% mean confidence interval for cycles value: -248.21 -155.69 > > > 95% mean confidence interval for cycles %-change: -1.67% -1.44% > > > Cycles are helped. > > > > > > total spills in shared programs: 28828 -> 28888 (0.21%) > > > spills in affected programs: 2037 -> 2097 (2.95%) > > > helped: 0 > > > HURT: 24 > > > > > > total fills in shared programs: 35542 -> 35639 (0.27%) > > > fills in affected programs: 3078 -> 3175 (3.15%) > > > helped: 2 > > > HURT: 26 > > > > > > I decided to look at some of the hurt shaders... it looks like > > > some > > > of > > > the Unigine geometry shaders really took a beating (+150% > > > instructions). > > > Note the "max" in the "instructions in affected programs" above. > > > > I am seeing quite different results on my KBL laptop: > > > > total instructions in shared programs: 14945933 -> 14858158 (- > > 0.59%) > > instructions in affected programs: 2842901 -> 2755126 (-3.09%) > > helped: 13196 > > HURT: 5 > > > > instructions HURT: shaders/closed/steam/deus-ex-mankind- > > divided/274.shader_test CS SIMD8: 1535 -> 1538 (0.20%) > > instructions HURT: shaders/closed/steam/deus-ex-mankind- > > divided/184.shader_test CS SIMD8: 1535 -> 1538 (0.20%) > > instructions HURT: shaders/dolphin/ubershaders/147.shader_test FS > > SIMD8: 3481 -> 3491 (0.29%) > > instructions HURT: shaders/dolphin/ubershaders/156.shader_test FS > > SIMD8: 3465 -> 3475 (0.29%) > > instructions HURT: shaders/dolphin/ubershaders/138.shader_test FS > > SIMD8: 3465 -> 3475 (0.29%) > > > > Did you test on a different gen? Can you paste here the paths of > > some > > of the GS shaders where you see the big regressions so I can verify > > I > > have them in my shader-db? > > > > Also, how did you test this patch exactly? When I was going to > > capture > > the reference shader-db results for patch 2 in this series so I > > could > > extract the results for patch 3 by comparing against it, I noticed > > that > > patch 2 would create constant folding scenarios (for example for > > ADD > > and MUL) that, before this patch, would hit an assertion in the > > driver > > since the algebraic pass only expects to find these opportunities > > for F > > types and will assert on that, so I guess you noticed this and > > fixed it > > before taking your numbers? > > I ran it through my usual shader-db gauntlet that runs shader-db at > each > commit for SKL, BDW, HSW, IVB, SNB, ILK, and GM45. *But* since one > pass > of that takes a really, really long time, I only run release builds > with > -march=native and all the other tricks. None of the assertions would > exist in that run. > > If patch 2 creates possible assertion failures, the two patches > should > probably be re-ordered or the previous patch should do something to > avoid the assertions. Some day, someone will hit that patch in a > bisect, and they will be sad.
Yes, absolutely. I have already fixed that locally. I'll send a v2 for review as soon as I verify the series in CI again. Iago > What I think happened is the previous patch helped a bunch of shaders > by > breaking them, and this patch fixes them... making them > "worse." When I > look at the total across both patches, I get results much more > similar > to yours, so this was a false alarm. > > Skylake > total instructions in shared programs: 15328886 -> 15219343 (-0.71%) > instructions in affected programs: 3999638 -> 3890095 (-2.74%) > helped: 14311 > HURT: 65 > helped stats (abs) min: 1 max: 502 x̄: 7.67 x̃: 4 > helped stats (rel) min: 0.04% max: 26.58% x̄: 4.07% x̃: 3.05% > HURT stats (abs) min: 1 max: 22 x̄: 2.35 x̃: 1 > HURT stats (rel) min: 0.06% max: 1.96% x̄: 0.42% x̃: 0.34% > 95% mean confidence interval for instructions value: -7.88 -7.36 > 95% mean confidence interval for instructions %-change: -4.10% -3.99% > Instructions are helped. > > total cycles in shared programs: 376419186 -> 372742630 (-0.98%) > cycles in affected programs: 277719712 -> 274043156 (-1.32%) > helped: 14204 > HURT: 3270 > helped stats (abs) min: 1 max: 118186 x̄: 278.57 x̃: 17 > helped stats (rel) min: <.01% max: 40.52% x̄: 2.71% x̃: 2.14% > HURT stats (abs) min: 1 max: 5031 x̄: 85.69 x̃: 26 > HURT stats (rel) min: <.01% max: 65.74% x̄: 6.12% x̃: 2.00% > 95% mean confidence interval for cycles value: -247.21 -173.60 > 95% mean confidence interval for cycles %-change: -1.15% -0.96% > Cycles are helped. > > total spills in shared programs: 32328 -> 28888 (-10.64%) > spills in affected programs: 26262 -> 22822 (-13.10%) > helped: 345 > HURT: 17 > > total fills in shared programs: 37768 -> 35639 (-5.64%) > fills in affected programs: 22394 -> 20265 (-9.51%) > helped: 354 > HURT: 24 > > LOST: 0 > GAINED: 5 > > > > > More comments below by SHL... > > > > > > > LOST: 0 > > > > GAINED: 5 > > > > --- > > > > src/intel/compiler/brw_fs.cpp | 203 > > > > ++++++++++++++++++++++++++++++++-- > > > > 1 file changed, 195 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/src/intel/compiler/brw_fs.cpp > > > > b/src/intel/compiler/brw_fs.cpp > > > > index 2358acbeb59..b2b60237c82 100644 > > > > --- a/src/intel/compiler/brw_fs.cpp > > > > +++ b/src/intel/compiler/brw_fs.cpp > > > > @@ -2583,9 +2583,55 @@ fs_visitor::opt_algebraic() > > > > break; > > > > > > > > case BRW_OPCODE_MUL: > > > > - if (inst->src[1].file != IMM) > > > > + if (inst->src[0].file != IMM && inst->src[1].file != > > > > IMM) > > > > continue; > > > > > > > > + /* Constant folding */ > > > > + if (inst->src[0].file == IMM && inst->src[1].file == > > > > IMM) > > > > { > > > > + assert(inst->src[0].type == inst->src[1].type); > > > > + bool local_progress = true; > > > > + switch (inst->src[0].type) { > > > > + case BRW_REGISTER_TYPE_HF: { > > > > + float v1 = _mesa_half_to_float(inst->src[0].ud > > > > & > > > > 0xffffu); > > > > + float v2 = _mesa_half_to_float(inst->src[1].ud > > > > & > > > > 0xffffu); > > > > + inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 > > > > * > > > > v2)); > > > > + break; > > > > + } > > > > + case BRW_REGISTER_TYPE_W: { > > > > + int16_t v1 = inst->src[0].ud & 0xffffu; > > > > + int16_t v2 = inst->src[1].ud & 0xffffu; > > > > + inst->src[0] = brw_imm_w(v1 * v2); > > > > + break; > > > > + } > > > > + case BRW_REGISTER_TYPE_UW: { > > > > + uint16_t v1 = inst->src[0].ud & 0xffffu; > > > > + uint16_t v2 = inst->src[1].ud & 0xffffu; > > > > + inst->src[0] = brw_imm_uw(v1 * v2); > > > > + break; > > > > + } > > > > + case BRW_REGISTER_TYPE_F: > > > > + inst->src[0].f *= inst->src[1].f; > > > > + break; > > > > + case BRW_REGISTER_TYPE_D: > > > > + inst->src[0].d *= inst->src[1].d; > > > > + break; > > > > + case BRW_REGISTER_TYPE_UD: > > > > + inst->src[0].ud *= inst->src[1].ud; > > > > + break; > > > > + default: > > > > + local_progress = false; > > > > + break; > > > > + }; > > > > + > > > > + if (local_progress) { > > > > + inst->opcode = BRW_OPCODE_MOV; > > > > + inst->src[1] = reg_undef; > > > > + progress = true; > > > > + break; > > > > + } > > > > + } > > > > + > > > > + > > > > /* a * 1.0 = a */ > > > > if (inst->src[1].is_one()) { > > > > inst->opcode = BRW_OPCODE_MOV; > > > > @@ -2594,6 +2640,14 @@ fs_visitor::opt_algebraic() > > > > break; > > > > } > > > > > > > > + if (inst->src[0].is_one()) { > > > > + inst->opcode = BRW_OPCODE_MOV; > > > > + inst->src[0] = inst->src[1]; > > > > + inst->src[1] = reg_undef; > > > > + progress = true; > > > > + break; > > > > + } > > > > + > > > > /* a * -1.0 = -a */ > > > > if (inst->src[1].is_negative_one()) { > > > > inst->opcode = BRW_OPCODE_MOV; > > > > @@ -2603,27 +2657,160 @@ fs_visitor::opt_algebraic() > > > > break; > > > > } > > > > > > > > - if (inst->src[0].file == IMM) { > > > > - assert(inst->src[0].type == BRW_REGISTER_TYPE_F); > > > > + if (inst->src[0].is_negative_one()) { > > > > + inst->opcode = BRW_OPCODE_MOV; > > > > + inst->src[0] = inst->src[1]; > > > > + inst->src[0].negate = !inst->src[1].negate; > > > > + inst->src[1] = reg_undef; > > > > + progress = true; > > > > + break; > > > > + } > > > > + > > > > + /* a * 0 = 0 (this is not exact for floating point) > > > > */ > > > > + if (inst->src[1].is_zero() && > > > > + brw_reg_type_is_integer(inst->src[1].type)) { > > > > + inst->opcode = BRW_OPCODE_MOV; > > > > + inst->src[0] = inst->src[1]; > > > > + inst->src[1] = reg_undef; > > > > + progress = true; > > > > + break; > > > > + } > > > > + > > > > + if (inst->src[0].is_zero() && > > > > + brw_reg_type_is_integer(inst->src[0].type)) { > > > > inst->opcode = BRW_OPCODE_MOV; > > > > - inst->src[0].f *= inst->src[1].f; > > > > inst->src[1] = reg_undef; > > > > progress = true; > > > > break; > > > > } > > > > break; > > > > case BRW_OPCODE_ADD: > > > > - if (inst->src[1].file != IMM) > > > > + if (inst->src[0].file != IMM && inst->src[1].file != > > > > IMM) > > > > continue; > > > > > > > > - if (inst->src[0].file == IMM) { > > > > - assert(inst->src[0].type == BRW_REGISTER_TYPE_F); > > > > + /* Constant folding */ > > > > + if (inst->src[0].file == IMM && inst->src[1].file == > > > > IMM) > > > > { > > > > + assert(inst->src[0].type == inst->src[1].type); > > > > + bool local_progress = true; > > > > + switch (inst->src[0].type) { > > > > + case BRW_REGISTER_TYPE_HF: { > > > > + float v1 = _mesa_half_to_float(inst->src[0].ud > > > > & > > > > 0xffffu); > > > > + float v2 = _mesa_half_to_float(inst->src[1].ud > > > > & > > > > 0xffffu); > > > > + inst->src[0] = brw_imm_w(_mesa_float_to_half(v1 > > > > + > > > > v2)); > > > > + break; > > > > + } > > > > + case BRW_REGISTER_TYPE_W: { > > > > + int16_t v1 = inst->src[0].ud & 0xffffu; > > > > + int16_t v2 = inst->src[1].ud & 0xffffu; > > > > + inst->src[0] = brw_imm_w(v1 + v2); > > > > + break; > > > > + } > > > > + case BRW_REGISTER_TYPE_UW: { > > > > + uint16_t v1 = inst->src[0].ud & 0xffffu; > > > > + uint16_t v2 = inst->src[1].ud & 0xffffu; > > > > + inst->src[0] = brw_imm_uw(v1 + v2); > > > > + break; > > > > + } > > > > + case BRW_REGISTER_TYPE_F: > > > > + inst->src[0].f += inst->src[1].f; > > > > + break; > > > > + case BRW_REGISTER_TYPE_D: > > > > + inst->src[0].d += inst->src[1].d; > > > > + break; > > > > + case BRW_REGISTER_TYPE_UD: > > > > + inst->src[0].ud += inst->src[1].ud; > > > > + break; > > > > + default: > > > > + local_progress = false; > > > > + break; > > > > + }; > > > > + > > > > + if (local_progress) { > > > > + inst->opcode = BRW_OPCODE_MOV; > > > > + inst->src[1] = reg_undef; > > > > + progress = true; > > > > + break; > > > > + } > > > > + } > > > > + > > > > + /* a + 0 = a (this is not exact for floating point) > > > > */ > > > > + if (inst->src[1].is_zero() && > > > > + brw_reg_type_is_integer(inst->src[1].type)) { > > > > inst->opcode = BRW_OPCODE_MOV; > > > > - inst->src[0].f += inst->src[1].f; > > > > inst->src[1] = reg_undef; > > > > progress = true; > > > > break; > > > > } > > > > + > > > > + if (inst->src[0].is_zero() && > > > > + brw_reg_type_is_integer(inst->src[0].type)) { > > > > + inst->opcode = BRW_OPCODE_MOV; > > > > + inst->src[0] = inst->src[1]; > > > > + inst->src[1] = reg_undef; > > > > + progress = true; > > > > + break; > > > > + } > > > > + break; > > > > + case BRW_OPCODE_SHL: > > > > + if (inst->src[0].file == IMM && inst->src[1].file == > > > > IMM) > > > > { > > > > + bool local_progress = true; > > > > + switch (inst->src[0].type) { > > > > + case BRW_REGISTER_TYPE_D: > > > > + case BRW_REGISTER_TYPE_UD: > > > > + inst->src[0].ud <<= inst->src[1].ud; > > > > + break; > > > > > > There's something fishy going on here. In one of the less hurt > > > Unigine > > > shaders, the before code was: > > > > > > mov(1) g2<1>UD 0x00000001UD { > > > align1 WE_all 1N compacted }; > > > mov(8) g126<1>UD g1<8,8,1>UD { > > > align1 WE_all 1Q compacted }; > > > shl(8) g127<1>UD g2<0,1,0>UD 0xffffffffUD { > > > align1 1Q compacted }; > > > > > > and the after code is > > > > > > mov(8) g126<1>UD g1<8,8,1>UD { > > > align1 WE_all 1Q compacted }; > > > mov(8) g127<1>UD 0x00000008UD { > > > align1 WE_all 1Q compacted }; > > > > > > That doesn't seem right. I thought 1U << 0xffffffffU should be > > > 1u << > > > 31 > > > = 0x80000000. Since Gen explicitly only uses the low 5 bits of > > > src1, > > > maybe we need "& 0x1f" around all the shift counts? I thought > > > the > > > CPU > > > did the same thing... > > > > > > Does this series pass the CI? > > > > > > > + case BRW_REGISTER_TYPE_W: > > > > + case BRW_REGISTER_TYPE_UW: { > > > > + uint16_t v1 = inst->src[0].ud & 0xffffu; > > > > + uint16_t v2 = inst->src[1].ud & 0xffffu; > > > > + inst->src[0] = retype(brw_imm_uw(v1 << v2), > > > > inst- > > > > > src[0].type); > > > > > > > > + break; > > > > + } > > > > + default: > > > > + local_progress = false; > > > > + break; > > > > + } > > > > + if (local_progress) { > > > > + inst->opcode = BRW_OPCODE_MOV; > > > > + inst->src[1] = reg_undef; > > > > + progress = true; > > > > + break; > > > > + } > > > > + } > > > > + break; > > > > + case BRW_OPCODE_SHR: > > > > + if (inst->src[0].file == IMM && inst->src[1].file == > > > > IMM) > > > > { > > > > + bool local_progress = true; > > > > + switch (inst->src[0].type) { > > > > + case BRW_REGISTER_TYPE_D: > > > > + inst->src[0].d >>= inst->src[1].ud; > > > > + break; > > > > + case BRW_REGISTER_TYPE_UD: > > > > + inst->src[0].ud >>= inst->src[1].ud; > > > > + break; > > > > + case BRW_REGISTER_TYPE_W: { > > > > + int16_t v1 = inst->src[0].ud & 0xffffu; > > > > + uint16_t v2 = inst->src[1].ud & 0xffffu; > > > > + inst->src[0] = brw_imm_w(v1 >> v2); > > > > + break; > > > > + } > > > > + case BRW_REGISTER_TYPE_UW: { > > > > + uint16_t v1 = inst->src[0].ud & 0xffffu; > > > > + uint16_t v2 = inst->src[1].ud & 0xffffu; > > > > + inst->src[0] = brw_imm_uw(v1 >> v2); > > > > + break; > > > > + } > > > > + default: > > > > + local_progress = false; > > > > + break; > > > > + } > > > > + if (local_progress) { > > > > + inst->opcode = BRW_OPCODE_MOV; > > > > + inst->src[1] = reg_undef; > > > > + progress = true; > > > > + break; > > > > + } > > > > + } > > > > break; > > > > case BRW_OPCODE_OR: > > > > if (inst->src[0].equals(inst->src[1]) || > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev