On Wed, Oct 4, 2017 at 5:35 PM, Jason Ekstrand <[email protected]> wrote:
> On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott <[email protected]> wrote: > >> This won't completely solve the problem. For example, what if you >> hoist the assignment to color2 outside the loop? >> >> vec4 color2; >> while (1) { >> vec4 color = texture(); >> color2 = color * 2; >> if (...) { >> break; >> } >> } >> gl_FragColor = color2; >> >> >> Now the definition still dominates the use, even with the modified >> control-flow graph, and you have the same problem > > > Curro had me convinced that some detail of the liveness analysis pass > saved us here but now I can't remember what. :-( > > >> The real problem is >> that the assignment to color2 is really a conditional assignment: if >> we're going channel-by-channel, it's not, but if you consider the >> *whole* register at the same time, it is. To really fix the problem, >> you need to model exactly what the machine actually does: you need to >> insert "fake" edges like these, that model the jumps that the machine >> can take, and you need to make every assignment a conditional >> assignment (i.e. it doesn't kill the register). It's probably not as >> bad with Curro's patch on top, though. Also, once you do this you can >> make register allocation more accurate by generating interferences >> from the liveness information directly instead of from the intervals. >> >> One thing I've thought about is, in addition to maintaining this >> "whole-vector" view of things, is to maintain a "per-channel" liveness >> that doesn't use the extra edges, partial definitions etc. and then >> use the "per-channel view" to calculate interference when the channels >> always line up. >> > > Yes, we've considered that and it's a good idea. However, I'm trying to > fix bugs right now, not write the world's best liveness analysis pass. :-) > You're correct, as usual... I've inspected the result of liveness anlaysis and we do indeed get it wrong. I'll come up with something less bogus. > On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand <[email protected]> >> wrote: >> > Shader-db results on Sky Lake: >> > >> > total instructions in shared programs: 12955125 -> 12953698 (-0.01%) >> > instructions in affected programs: 55956 -> 54529 (-2.55%) >> > helped: 6 >> > HURT: 38 >> > >> > All of the hurt programs were hurt by exactly one instruction because >> > this patch affects copy propagation. Most of the helped instructions >> > came from a single orbital explorer shader that was helped by 14.26% >> > >> > Cc: [email protected] >> > --- >> > src/intel/compiler/brw_cfg.cpp | 37 ++++++++++++++++++++++++++++++ >> +++++-- >> > 1 file changed, 35 insertions(+), 2 deletions(-) >> > >> > diff --git a/src/intel/compiler/brw_cfg.cpp >> b/src/intel/compiler/brw_cfg.cpp >> > index fad12ee..d8bf725 100644 >> > --- a/src/intel/compiler/brw_cfg.cpp >> > +++ b/src/intel/compiler/brw_cfg.cpp >> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions) >> > assert(cur_while != NULL); >> > cur->add_successor(mem_ctx, cur_while); >> > >> > + /* We also add the next block as a successor of the break. >> If the >> > + * break is predicated, we need to do this because the break >> may not >> > + * be taken. If the break is not predicated, we add it >> anyway so >> > + * that our live intervals computations will operate as if >> the break >> > + * may or may not be taken. Consider the following example: >> > + * >> > + * vec4 color2; >> > + * while (1) { >> > + * vec4 color = texture(); >> > + * if (...) { >> > + * color2 = color * 2; >> > + * break; >> > + * } >> > + * } >> > + * gl_FragColor = color2; >> > + * >> > + * In this case, the definition of color2 dominates the use >> because >> > + * the loop only has the one exit. This means that the live >> range >> > + * interval for color2 goes from the statement in the if to >> it's use >> > + * below the loop. Now suppose that the texture operation >> has a >> > + * header register that gets assigned one of the registers >> used for >> > + * color2. If the loop condition is non-uniform and some of >> the >> > + * threads will take the break and others will continue. In >> this >> > + * case, the next pass through the loop, the WE_all setup of >> the >> > + * header register will stomp the disabled channels of color2 >> and >> > + * corrupt the value. >> > + * >> > + * This same problem can occur if you have a mix of 64, 32, >> and >> > + * 16-bit registers because the channels do not line up or if >> you >> > + * have a SIMD16 program and the first half of one value >> overlaps the >> > + * second half of the other. To solve it, we simply treat >> the break >> > + * as if it may also continue on because some of the threads >> may >> > + * continue on. >> > + */ >> > next = new_block(); >> > - if (inst->predicate) >> > - cur->add_successor(mem_ctx, next); >> > + cur->add_successor(mem_ctx, next); >> > >> > set_next_block(&cur, next, ip); >> > break; >> > -- >> > 2.5.0.400.gff86faf >> > >> > _______________________________________________ >> > mesa-dev mailing list >> > [email protected] >> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > >
_______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
