Re: [Mesa-dev] [PATCH 1/2] r600g: report correct flow control depth, taking hardware bugs into account

Vadim Girlin Sun, 03 Feb 2013 12:57:42 -0800

On 02/03/2013 10:46 PM, Marek Olšák wrote:

The shader test passes on my Redwood. I also added more ifs and loops,
no difference. I'll just set the max control flow depth to 32 for all
asics. We'll deal with the hw bug when it shows up.

I thought a bit more about it, probably the test simply doesn't triggerthe bug. The description says "CF_INST_ALU* instructions do not save theactive mask correctly", but this test doesn't really check if the savedmask is correct after restoring it from the stack, that is, after goingup one level out of the problematic branch and doing something there.Also perhaps less static and more diverging branches may help toreproduce it.

Anyway, I agree that it's better not to limit anything until the bugshows up so that we better understand what we are fixing and whether thefix really helps. In the worst case I think we can also use separatePUSH, ALU and POP instructions - it shouldn't be hard to implement.


Vadim


Marek

On Fri, Feb 1, 2013 at 4:34 PM, Vadim Girlin <vadimgir...@gmail.com> wrote:

On 02/01/2013 05:47 PM, Marek Olšák wrote:


On Fri, Feb 1, 2013 at 6:12 AM, Vadim Girlin <vadimgir...@gmail.com>
wrote:


On 02/01/2013 03:20 AM, Marek Olšák wrote:



---
    src/gallium/drivers/r600/r600_pipe.c |    9 +++++++--
    1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c
b/src/gallium/drivers/r600/r600_pipe.c
index a59578d..1698cb3 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -640,6 +640,8 @@ static float r600_get_paramf(struct pipe_screen*
pscreen,

    static int r600_get_shader_param(struct pipe_screen* pscreen,
unsigned
shader, enum pipe_shader_cap param)
    {
+       struct r600_screen *rscreen = (struct r600_screen *)pscreen;
+
          switch(shader)
          {
          case PIPE_SHADER_FRAGMENT:
@@ -654,7 +656,6 @@ static int r600_get_shader_param(struct pipe_screen*
pscreen, unsigned shader, e
                  return 0;
          }

-       /* XXX: all these should be fixed, since r600 surely supports
much
more! */
          switch (param) {
          case PIPE_SHADER_CAP_MAX_INSTRUCTIONS:
          case PIPE_SHADER_CAP_MAX_ALU_INSTRUCTIONS:
@@ -662,7 +663,11 @@ static int r600_get_shader_param(struct
pipe_screen*
pscreen, unsigned shader, e
          case PIPE_SHADER_CAP_MAX_TEX_INDIRECTIONS:
                  return 16384;
          case PIPE_SHADER_CAP_MAX_CONTROL_FLOW_DEPTH:
-               return 8; /* XXX */
+               /* There is a bug on certain Evergreen cards which
limits
+                * the control flow depth. */
+               return rscreen->chip_class == EVERGREEN &&
+                      rscreen->family != CHIP_CYPRESS &&
+                      rscreen->family != CHIP_HEMLOCK ? 3 : 32;




I guess we can use more strict condition regarding the affected chips if
the
bug description in the evergreen isa pdf is correct ("Chapter 4. ALU
Clauses"):

NOTE: For the 54xx and 55xx AMD GPU series only, the CF_INST_ALU*
instructions do not save the active mask correctly. The branching can be
wrong,
possibly producing incorrect results and infinite loops. The three
possible work-
arounds are:

a. Avoid using the CF_ALU_PUSH_BEFORE, CF_ALU_ELSE_AFTER
CF_ALU_BREAK, and CF_ALU_CONTINUE instructions.
b. Do not use the CF_INST_ALU* instructions when your stack depth
exceeds three elements (not entries); for the 54XX series AMD GPUs,
do not exceed a stack size of seven, since this GPU series has a vector
size 32.
c. Do not use these instructions when your non-zero stack depth mod 4 is
0 (or mod 8 is 0, for vector size 32).




E.g. it seems juniper isn't affected according to the doc.



I was talking with Tom about it and he thinks Juniper is affected as well.


Also I'm not sure how the meaning of the "max control flow depth" in
gallium
maps to the hw's meaning - I suspect each loop is counted as a single
level,
but hw uses 4 or 8 stack elements (subentries) per loop (EG ISA PDF,
Section
"3.6.5 Stack Allocation"). So it seems we can't use ALU_xxx instructions
even in a single top-level loop on affected chips, if I understand it
right.



There are 3 variables affecting control flow:
- Maximum if depth, used for if flattening. The maximum control flow
depth CAP maps exactly to this variable.
- Maximum number of allowed iterations for a loop to be unrolled. The
catch is the loop analysis in the GLSL compiler must successfully
determine the number of iterations. Then, if the number of iterations
of a loop is less or equal to this variable, the loop will be
unrolled. We can't control this variable from the driver yet, but the
state tracker can. The default is 255.
- There is also an upper limit on the size of the loop body, which
also prevents loop unrolling. This is sort of hardcoded in the
lowering pass.

BTW, I think I have never had an issue with Redwood. I guess we don't
have a piglit test using enough loops.



I never seen any related issues with my Juniper as well. I was also
wondering why there are no bug reports for that despite that we don't have
any workarounds.

I think we can ask proprietary compiler (shader analyzer). I tried the
following shader:

uniform int a,b,c,d;
uniform float k;
void main() {
         float f = 0.0;
         if (a > 1) {
                 if (b > 1) {
                         if (c > 1) {
                                 if (d > 1) {
                                         f = k;
                                 }
                         }
                 }
         }
         gl_FragColor = vec4(f);
}

For Cedar (5450) it produces the following code for the inner if:

...


06 PUSH ADDR(11) POP_CNT(3)
07 ALU: ADDR(36) CNT(1) KCACHE0(CB0:0-15)
       4  x: PREDGT_INT  ____,  KC0[3].x,  1      UPDATE_EXEC_MASK
UPDATE_PRED
08 JUMP  POP_CNT(4) ADDR(11) VALID_PIX


...

The first two instructions are PUSH and ALU, instead of ALU_PUSH_BEFORE. The
same code is produced for Redwood (5670), Caicos (6450), Turks (6670), Barts
(6870). I think it's exactly the workaround for that bug.

On Juniper (5770) and Cypress(5870) it uses ALU_PUSH_BEFORE.

I've tried to create shader_test based on this shader (see attached file),
it produces four nested ALU_PUSH_BEFORE with r600g and passes with my
juniper, though I'm not sure if it can really detect the fail.

Vadim


Marek


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] r600g: report correct flow control depth, taking hardware bugs into account

Reply via email to