On 04/05/17 10:43, Ilia Mirkin wrote:
I looked into removing these passes a while back. A little detail here
is that DCE has to get run at least a little bit for correctness
reasons -- some (lazy) glsl -> tgsi

Is there anyway to improve this? I was actually looking at this a little yesterday but to be honest I was getting a little lost in the code. It wasn't entirely clear to me how this conversion pass works and if we can stop doing this in the first place.

glsl_to_tgsi_visitor::visit_expression(ir_expression* ir, st_src_reg *op) and glsl_to_tgsi_visitor::visit(ir_texture *ir)

Have a comment:

   /* Storage for our result.  Ideally for an assignment we'd be using
    * the actual storage for the result here, instead.
    */

The code all seems to be pretty much a copy of ir_to_mesa which has the same limitations.

code uses the dead_mask with the
assumption that DCE is run later.

Another observation was that while, indeed, those passes take a while,
they also remove a bunch of code. Not running those passes makes them
disappear from the profile, but then the logic down the line has more
work to do. It's not as obvious of a win as one might think.

Yes I noticed this also :)


Additionally, at least for nouveau, this logic does better than
nouveau's codegen in some cases. (Largely due to the fact that system
values get copy-propagated here but they don't by codegen, generally,
which can lead to large register live intervals. At least that's what
I recall from my analysis a while ago.)

This was my hack-branch: https://github.com/imirkin/mesa/commits/st-pass-cleanup

On Wed, May 3, 2017 at 8:36 PM, Timothy Arceri <tarc...@itsqueeze.com> wrote:
This and the tgsi copy_propagation pass are very slow, I'd really like it if
we both didn't require the pass for things to work and also didn't make it
any slower. perf is showing these 2 passes at ~11% during Deus Ex start-up
(cold cache).

Can we not just add a DCE call to the glsl linker/compiler calls e.g.

    if (ctx->Const.GLSLOptimizeConservatively) {
       /* Run it just once. */
       do_common_optimization(ir, true, false,
                                 &ctx->Const.ShaderCompilerOptions[stage],
                                 ctx->Const.NativeIntegers);

      ---> call DCE here to clean up since we don't call the opt passes again
<---

    } else {
       /* Repeat it until it stops making changes. */
       while (do_common_optimization(ir, true, false,

&ctx->Const.ShaderCompilerOptions[stage],
                                     ctx->Const.NativeIntegers))
          ;

    }

On 03/05/17 00:43, Samuel Pitoiset wrote:

The TGSI DCE pass doesn't eliminate dead assignments like
MOV TEMP[0], TEMP[1] in presence of loops because it assumes
that the visitor doesn't emit dead code. This assumption is
actually wrong and this situation happens.

However, it appears that the merge_registers() pass accidentally
takes care of this for some weird reasons. But since this pass has
been disabled for RadeonSI and Nouveau, the renumber_registers()
pass which is called *after*, can't do its job correctly.

This is because it assumes that no dead code is present. But if
there is still a dead assignment, it might re-use the TEMP
register id incorrectly and emits wrong code.

This patches eliminates all dead assignments by tracking
all temporary register reads like what the renumber_registers()
actually does. The best solution would be to rewrite that DCE
pass entirely but it's more work.

This should fix Unigine Heaven on RadeonSI and Nouveau.

shader-db results with RadeonSI:

47109 shaders in 29632 tests
Totals:
SGPRS: 1919548 -> 1919572 (0.00 %)
VGPRS: 1139205 -> 1139209 (0.00 %)
Spilled SGPRs: 1865 -> 1867 (0.11 %)
Spilled VGPRs: 65 -> 65 (0.00 %)
Private memory VGPRs: 1184 -> 1184 (0.00 %)
Scratch size: 1308 -> 1308 (0.00 %) dwords per thread
Code Size: 60083036 -> 60083244 (0.00 %) bytes
LDS: 1077 -> 1077 (0.00 %) blocks
Max Waves: 431200 -> 431200 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

It's still interesting to disable the merge_registers() pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com>
---
   src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 36
+++++++++++++++++++++++++-----
   1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 0f8688a41c..01b5a4dc98 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -5085,9 +5085,13 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)

glsl_to_tgsi_instruction *,
                                                        this->next_temp *
4);
      int *write_level = rzalloc_array(mem_ctx, int, this->next_temp * 4);
+   int *first_reads = rzalloc_array(mem_ctx, int, this->next_temp);
      int level = 0;
      int removed = 0;
   +   for (int i = 0; i < this->next_temp; i++)
+      first_reads[i] = -1;
+
      foreach_in_list(glsl_to_tgsi_instruction, inst, &this->instructions)
{
         assert(inst->dst[0].file != PROGRAM_TEMPORARY
                || inst->dst[0].index < this->next_temp);
@@ -5148,8 +5152,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
                  src_chans |= 1 << GET_SWZ(inst->src[i].swizzle, 3);
                    for (int c = 0; c < 4; c++) {
-                  if (src_chans & (1 << c))
+                  if (src_chans & (1 << c)) {
                        writes[4 * inst->src[i].index + c] = NULL;
+
+                     if (first_reads[inst->src[i].index] == -1)
+                        first_reads[inst->src[i].index] = level;
+                  }
                  }
               }
            }
@@ -5167,8 +5175,12 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
                  src_chans |= 1 << GET_SWZ(inst->tex_offsets[i].swizzle,
3);
                    for (int c = 0; c < 4; c++) {
-                  if (src_chans & (1 << c))
+                  if (src_chans & (1 << c)) {
                        writes[4 * inst->tex_offsets[i].index + c] = NULL;
+
+                     if (first_reads[inst->tex_offsets[i].index] == -1)
+                        first_reads[inst->tex_offsets[i].index] = level;
+                  }
                  }
               }
            }
@@ -5211,17 +5223,30 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
       * the writemask of other instructions with dead channels.
       */
      foreach_in_list_safe(glsl_to_tgsi_instruction, inst,
&this->instructions) {
-      if (!inst->dead_mask || !inst->dst[0].writemask)
+      bool dead_inst = false;
+
+      if (!inst->dst[0].writemask)
            continue;
+
         /* No amount of dead masks should remove memory stores */
         if (inst->info->is_store)
            continue;
   -      if ((inst->dst[0].writemask & ~inst->dead_mask) == 0) {
+      if (!inst->dead_mask) {
+         if (inst->dst[0].file == PROGRAM_TEMPORARY &&
+             first_reads[inst->dst[0].index] == -1) {
+            dead_inst = true;
+         }
+      } else {
+         if ((inst->dst[0].writemask & ~inst->dead_mask) == 0)
+            dead_inst = true;
+      }
+
+      if (dead_inst) {
            inst->remove();
            delete inst;
            removed++;
-      } else {
+      } else if (inst->dead_mask) {
            if (glsl_base_type_is_64bit(inst->dst[0].type)) {
               if (inst->dead_mask == WRITEMASK_XY ||
                   inst->dead_mask == WRITEMASK_ZW)
@@ -5232,6 +5257,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
      }
        ralloc_free(write_level);
+   ralloc_free(first_reads);
      ralloc_free(writes);
        return removed;

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to