I looked into removing these passes a while back. A little detail here is that DCE has to get run at least a little bit for correctness reasons -- some (lazy) glsl -> tgsi code uses the dead_mask with the assumption that DCE is run later.
Another observation was that while, indeed, those passes take a while, they also remove a bunch of code. Not running those passes makes them disappear from the profile, but then the logic down the line has more work to do. It's not as obvious of a win as one might think. Additionally, at least for nouveau, this logic does better than nouveau's codegen in some cases. (Largely due to the fact that system values get copy-propagated here but they don't by codegen, generally, which can lead to large register live intervals. At least that's what I recall from my analysis a while ago.) This was my hack-branch: https://github.com/imirkin/mesa/commits/st-pass-cleanup On Wed, May 3, 2017 at 8:36 PM, Timothy Arceri <tarc...@itsqueeze.com> wrote: > This and the tgsi copy_propagation pass are very slow, I'd really like it if > we both didn't require the pass for things to work and also didn't make it > any slower. perf is showing these 2 passes at ~11% during Deus Ex start-up > (cold cache). > > Can we not just add a DCE call to the glsl linker/compiler calls e.g. > > if (ctx->Const.GLSLOptimizeConservatively) { > /* Run it just once. */ > do_common_optimization(ir, true, false, > &ctx->Const.ShaderCompilerOptions[stage], > ctx->Const.NativeIntegers); > > ---> call DCE here to clean up since we don't call the opt passes again > <--- > > } else { > /* Repeat it until it stops making changes. */ > while (do_common_optimization(ir, true, false, > > &ctx->Const.ShaderCompilerOptions[stage], > ctx->Const.NativeIntegers)) > ; > > } > > On 03/05/17 00:43, Samuel Pitoiset wrote: >> >> The TGSI DCE pass doesn't eliminate dead assignments like >> MOV TEMP[0], TEMP[1] in presence of loops because it assumes >> that the visitor doesn't emit dead code. This assumption is >> actually wrong and this situation happens. >> >> However, it appears that the merge_registers() pass accidentally >> takes care of this for some weird reasons. But since this pass has >> been disabled for RadeonSI and Nouveau, the renumber_registers() >> pass which is called *after*, can't do its job correctly. >> >> This is because it assumes that no dead code is present. But if >> there is still a dead assignment, it might re-use the TEMP >> register id incorrectly and emits wrong code. >> >> This patches eliminates all dead assignments by tracking >> all temporary register reads like what the renumber_registers() >> actually does. The best solution would be to rewrite that DCE >> pass entirely but it's more work. >> >> This should fix Unigine Heaven on RadeonSI and Nouveau. >> >> shader-db results with RadeonSI: >> >> 47109 shaders in 29632 tests >> Totals: >> SGPRS: 1919548 -> 1919572 (0.00 %) >> VGPRS: 1139205 -> 1139209 (0.00 %) >> Spilled SGPRs: 1865 -> 1867 (0.11 %) >> Spilled VGPRs: 65 -> 65 (0.00 %) >> Private memory VGPRs: 1184 -> 1184 (0.00 %) >> Scratch size: 1308 -> 1308 (0.00 %) dwords per thread >> Code Size: 60083036 -> 60083244 (0.00 %) bytes >> LDS: 1077 -> 1077 (0.00 %) blocks >> Max Waves: 431200 -> 431200 (0.00 %) >> Wait states: 0 -> 0 (0.00 %) >> >> It's still interesting to disable the merge_registers() pass. >> >> Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com> >> --- >> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 36 >> +++++++++++++++++++++++++----- >> 1 file changed, 31 insertions(+), 5 deletions(-) >> >> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> index 0f8688a41c..01b5a4dc98 100644 >> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp >> @@ -5085,9 +5085,13 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void) >> >> glsl_to_tgsi_instruction *, >> this->next_temp * >> 4); >> int *write_level = rzalloc_array(mem_ctx, int, this->next_temp * 4); >> + int *first_reads = rzalloc_array(mem_ctx, int, this->next_temp); >> int level = 0; >> int removed = 0; >> + for (int i = 0; i < this->next_temp; i++) >> + first_reads[i] = -1; >> + >> foreach_in_list(glsl_to_tgsi_instruction, inst, &this->instructions) >> { >> assert(inst->dst[0].file != PROGRAM_TEMPORARY >> || inst->dst[0].index < this->next_temp); >> @@ -5148,8 +5152,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void) >> src_chans |= 1 << GET_SWZ(inst->src[i].swizzle, 3); >> for (int c = 0; c < 4; c++) { >> - if (src_chans & (1 << c)) >> + if (src_chans & (1 << c)) { >> writes[4 * inst->src[i].index + c] = NULL; >> + >> + if (first_reads[inst->src[i].index] == -1) >> + first_reads[inst->src[i].index] = level; >> + } >> } >> } >> } >> @@ -5167,8 +5175,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void) >> src_chans |= 1 << GET_SWZ(inst->tex_offsets[i].swizzle, >> 3); >> for (int c = 0; c < 4; c++) { >> - if (src_chans & (1 << c)) >> + if (src_chans & (1 << c)) { >> writes[4 * inst->tex_offsets[i].index + c] = NULL; >> + >> + if (first_reads[inst->tex_offsets[i].index] == -1) >> + first_reads[inst->tex_offsets[i].index] = level; >> + } >> } >> } >> } >> @@ -5211,17 +5223,30 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void) >> * the writemask of other instructions with dead channels. >> */ >> foreach_in_list_safe(glsl_to_tgsi_instruction, inst, >> &this->instructions) { >> - if (!inst->dead_mask || !inst->dst[0].writemask) >> + bool dead_inst = false; >> + >> + if (!inst->dst[0].writemask) >> continue; >> + >> /* No amount of dead masks should remove memory stores */ >> if (inst->info->is_store) >> continue; >> - if ((inst->dst[0].writemask & ~inst->dead_mask) == 0) { >> + if (!inst->dead_mask) { >> + if (inst->dst[0].file == PROGRAM_TEMPORARY && >> + first_reads[inst->dst[0].index] == -1) { >> + dead_inst = true; >> + } >> + } else { >> + if ((inst->dst[0].writemask & ~inst->dead_mask) == 0) >> + dead_inst = true; >> + } >> + >> + if (dead_inst) { >> inst->remove(); >> delete inst; >> removed++; >> - } else { >> + } else if (inst->dead_mask) { >> if (glsl_base_type_is_64bit(inst->dst[0].type)) { >> if (inst->dead_mask == WRITEMASK_XY || >> inst->dead_mask == WRITEMASK_ZW) >> @@ -5232,6 +5257,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void) >> } >> ralloc_free(write_level); >> + ralloc_free(first_reads); >> ralloc_free(writes); >> return removed; >> > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev