I've submitted an alternative to this approach here [1], more along the
lines of the v1 patch you borrowed from my jenkins branch.  Most of the
reasons for that we have discussed already in the office (except for the
last point), but FTR:

 - This patch only works around the problem by tweaking the live
   intervals, which makes the live intervals inconsistent with the block
   live sets (the latter are used by some back-end passes that this
   patch isn't going to help with, like DCE and scheduling).

 - This patch doesn't fix the same bug on the VEC4 back-end.

 - I feel that the alternative approach [1] admits more straightforward
   extension to more exotic control-flow constructs.

 - It doesn't actually fix the problem for all possible patterns of
   divergent loop execution.  Consider the following GLSL pseudocode:

   | b = 0;
   | do {
   |    use(b);
   |    b = non_uniform_condition();
   |    if (b)
   |       continue;
   | 
   |    stomp_neighboring_channels();
   |    break;
   | 
   | } while(true);

   With this series applied, the live interval of 'b' will be calculated
   to be between the 'b = 0' and 'continue' statements, which allows
   stomp_neighboring_channels() to overwrite the contents of 'b'
   depending on how the register allocator lays things out in the
   register file.  OTOH with [1] applied the live interval calculated
   for 'b' will be the region between the 'b = 0' and 'break'
   statements, which overlaps the whole region of divergent control flow
   as expected.

Cheers.

[1] https://lists.freedesktop.org/archives/mesa-dev/2017-October/174591.html

Jason Ekstrand <ja...@jlekstrand.net> writes:

> No Shader-db changes.
>
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/compiler/brw_fs_live_variables.cpp | 55 
> ++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
> b/src/intel/compiler/brw_fs_live_variables.cpp
> index c449672..380060d 100644
> --- a/src/intel/compiler/brw_fs_live_variables.cpp
> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>           }
>        }
>     }
> +
> +   /* Due to the explicit way the SIMD data is handled on GEN, we need to be 
> a
> +    * bit more careful with live ranges and loops.  Consider the following
> +    * example:
> +    *
> +    *    vec4 color2;
> +    *    while (1) {
> +    *       vec4 color = texture();
> +    *       if (...) {
> +    *          color2 = color * 2;
> +    *          break;
> +    *       }
> +    *    }
> +    *    gl_FragColor = color2;
> +    *
> +    * In this case, the definition of color2 dominates the use because the
> +    * loop only has the one exit.  This means that the live range interval 
> for
> +    * color2 goes from the statement in the if to it's use below the loop.
> +    * Now suppose that the texture operation has a header register that gets
> +    * assigned one of the registers used for color2.  If the loop condition 
> is
> +    * non-uniform and some of the threads will take the and others will
> +    * continue.  In this case, the next pass through the loop, the WE_all
> +    * setup of the header register will stomp the disabled channels of color2
> +    * and corrupt the value.
> +    *
> +    * This same problem can occur if you have a mix of 64, 32, and 16-bit
> +    * registers because the channels do not line up or if you have a SIMD16
> +    * program and the first half of one value overlaps the second half of the
> +    * other.
> +    *
> +    * To solve this problem, we take any VGRFs whose live ranges cross the
> +    * while instruction of a loop and extend their live ranges to the top of
> +    * the loop.  This more accurately models the hardware because the value 
> in
> +    * the VGRF needs to be carried through subsequent loop iterations in 
> order
> +    * to remain valid when we finally do break.
> +    */
> +   foreach_block (block, cfg) {
> +      if (block->end()->opcode != BRW_OPCODE_WHILE)
> +         continue;
> +
> +      /* This is a WHILE instrution. Find the DO block. */
> +      bblock_t *do_block = NULL;
> +      foreach_list_typed(bblock_link, child_link, link, &block->children) {
> +         if (child_link->block->start_ip < block->end_ip) {
> +            assert(do_block == NULL);
> +            do_block = child_link->block;
> +         }
> +      }
> +      assert(do_block);
> +
> +      for (int i = 0; i < num_vars; i++) {
> +         if (start[i] < block->end_ip && end[i] > block->end_ip)
> +            start[i] = MIN2(start[i], do_block->start_ip);
> +      }
> +   }
>  }
>  
>  fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg)
> -- 
> 2.5.0.400.gff86faf
>
> _______________________________________________
> mesa-stable mailing list
> mesa-sta...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable

Attachment: signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to