Kenneth Graunke <kenn...@whitecape.org> writes: > (Co-authored by Matt Turner.) > > Image atomics, for example, return a value - but the shader may not > want to use it. We assigned a useless VGRF destination. This seemed > harmless, but it can actually be quite harmful. The register allocator > has to assign that VGRF to a real register. It may assign the same > actual GRF to the destination of an instruction that follows soon after. > > This results in a write-after-write (WAW) dependency, and stall. > > A number of "Deus Ex: Mankind Divided" shaders use image atomics, but > don't use the return value. Several of these were hitting WAW stalls > for nearly 14,000 cycles a pop. This patch cuts one shader's cycles > by -98.35%! >
This will also help reduce the amount of traffic between the EU and HDC, because [as ISTR we discussed yesterday in the office] the generator asks the HDC not to send any return payload when it sees a null destination region. OTOH the 14 kcycle estimate is likely to be overly pessimistic, especially for more recent hardware, the actual number of cycles spent in the stall could easily be two orders of magnitude lower than the estimate -- Did you happen to test this change on the real workload? > Making dead code elimination null out the destination avoids this issue. > We can drop the redundant NOP handling as well - a following hunk > already turns instructions with null destinations and no effects into > NOPs. > The code supposed to do that seems rather buggy... The code you removed below would only NOP an instruction if it didn't write the accumulator or flag register, but the following hunk would then go and NOP it if the flag register wasn't live, regardless of whether the accumulator result From the instruction was useful, and regardless of whether the instruction had any side effects. > We do need to special case memory barriers as they need an actual > VGRF to function correctly, even though that registers serves only > to create register dependencies. > I don't think memory barriers are the only case this could break. AFAIK a SEND message with null destination register needs to specify a return payload length of zero, which may or may not be valid depending on the message... The only reason this optimization works as-is for surface atomic messages is that the FS generator already special-cases null destination registers [basically in order to make this optimization possible, this codepath is currently unused] and then asks brw_eu_emit for a surface atomic SEND message without return payload, which requires the "return data expected" control bit to be set up accordingly in the SFID-specific message control fields. I'm not aware of any send-like virtual instruction currently supported by the back-end which is able to do the same thing other than SHADER_OPCODE_TYPED_ATOMIC and SHADER_OPCODE_UNTYPED_ATOMIC. I suggest you define a 'can_omit_write()' predicate (either as a static function or as a backend_instruction method) which returns true except for non-surface atomic send-from-grf instructions, and use it below instead of the SHADER_OPCODE_MEMORY_FENCE special casing. > On Skylake: > > total instructions in shared programs: 13700090 -> 13699698 (-0.00%) > instructions in affected programs: 13514 -> 13122 (-2.90%) > helped: 2 > HURT: 0 > > total cycles in shared programs: 288879704 -> 278651496 (-3.54%) > cycles in affected programs: 15842112 -> 5613904 (-64.56%) > helped: 19 > HURT: 3 > > total spills in shared programs: 15040 -> 14990 (-0.33%) > spills in affected programs: 140 -> 90 (-35.71%) > helped: 2 > HURT: 0 > > total fills in shared programs: 17328 -> 17264 (-0.37%) > fills in affected programs: 228 -> 164 (-28.07%) > helped: 2 > HURT: 0 > > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> > --- > src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > index 8a0469a..4c524a4 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_dead_code_eliminate.cpp > @@ -52,7 +52,8 @@ fs_visitor::dead_code_eliminate() > sizeof(BITSET_WORD)); > > foreach_inst_in_block_reverse_safe(fs_inst, inst, block) { > - if (inst->dst.file == VGRF && !inst->has_side_effects()) { > + if (inst->dst.file == VGRF && > + inst->opcode != SHADER_OPCODE_MEMORY_FENCE) { > const unsigned var = live_intervals->var_from_reg(inst->dst); > bool result_live = false; > > @@ -60,13 +61,8 @@ fs_visitor::dead_code_eliminate() > result_live |= BITSET_TEST(live, var + i); > > if (!result_live) { > + inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type)); > progress = true; > - > - if (inst->writes_accumulator || inst->flags_written()) { > - inst->dst = fs_reg(retype(brw_null_reg(), inst->dst.type)); > - } else { > - inst->opcode = BRW_OPCODE_NOP; > - } > } > } > > -- > 2.10.2 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev