On Thu, 2015-10-22 at 16:38 +0200, Iago Toral wrote: > On Thu, 2015-10-22 at 09:39 -0400, Connor Abbott wrote: > > On Thu, Oct 22, 2015 at 7:21 AM, Iago Toral Quiroga <ito...@igalia.com> > > wrote: > > > I implemented this first as a separate optimization pass in GLSL IR [1], > > > but > > > Curro pointed out that this being pretty much a restricted form of a CSE > > > pass > > > it would probably make more sense to do it inside CSE (and we no longer > > > have > > > a CSE pass in GLSL IR). > > > > > > Unlike other things we CSE in NIR, in the case of SSBO loads we need to > > > make > > > sure that we invalidate previous entries in the set in the presence of > > > conflicting instructions (i.e. SSBO writes to the same block and offset) > > > or > > > in the presence of memory barriers. > > > > > > If this is accepted I intend to extend this to also cover image reads, > > > which > > > follow similar behavior. > > > > > > No regressions observed in piglit or dEQP's SSBO functional tests. > > > > > > [1] > > > http://lists.freedesktop.org/archives/mesa-dev/2015-October/097718.html > > > > > > Iago Toral Quiroga (2): > > > nir/cse: invalidate SSBO loads in presence of ssbo writes or memory > > > barriers > > > nir/instr_set: allow rewrite of SSBO loads > > > > > > src/glsl/nir/nir_instr_set.c | 24 ++++++-- > > > src/glsl/nir/nir_opt_cse.c | 142 > > > +++++++++++++++++++++++++++++++++++++++++++ > > > 2 files changed, 162 insertions(+), 4 deletions(-) > > > > > > -- > > > 1.9.1 > > > > > > _______________________________________________ > > > mesa-dev mailing list > > > mesa-dev@lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > NAK, this isn't going to work. NIR CSE is designed for operations > > which can be moved around freely as long they're still dominated by > > the SSA values they use. It makes heavy advantage of this to avoid > > looking at the entire CFG and instead only at the current block and > > its parents in the dominance tree. For example, imagine you have > > something like: > > > > A = load_ssbo 0 > > if (cond) { > > store_ssbo 0 > > } > > B = load_ssbo 0 > > > > Then A and B can't be combined, but CSE will combine them anyways when > > it reaches B because it keeps a hash table of values dominating B and > > finds A as a match. It doesn't look at the if conditional at all > > because it doesn't dominate the load to B. This is great when you want > > to CSE pure things that don't depend on other side effects -- after > > all, this is the sort of efficiency that SSA is supposed to give us -- > > but it means that as-is, it can't be used for e.g. SSBO's and images > > without completely changing how the pass works and making it less > > efficient. > > Ugh! One would think that at least one of the 2000+ SSBO tests in dEQP > would catch something like this... I guess not :(.
However, I have just tested this and it works just fine. See: buffer Fragments { vec4 v; }; out vec4 color; void main() { vec4 tmp = v; if (tmp.x > 0) { v = vec4(0, 1, 0, 1); } color = v; } And the final NIR SSA form for this is: impl main { block block_0: /* preds: */ vec1 ssa_0 = load_const (0x00000000 /* 0.000000 */) vec4 ssa_1 = load_const (0x00000000 /* 0.000000 */, 0x3f800000 /* 1.000000 */, 0x00000000 /* 0.000000 */, 0x3f800000 /* 1.000000 */) vec4 ssa_2 = intrinsic load_ssbo (ssa_0) () (0) vec1 ssa_3 = flt ssa_0, ssa_2 /* succs: block_1 block_2 */ if ssa_3 { block block_1: /* preds: block_0 */ intrinsic store_ssbo (ssa_1, ssa_0) () (0, 15) /* succs: block_3 */ } else { block block_2: /* preds: block_0 */ /* succs: block_3 */ } block block_3: /* preds: block_1 block_2 */ vec4 ssa_4 = intrinsic load_ssbo (ssa_0) () (0) intrinsic store_output (ssa_4) () (0) /* color */ /* succs: block_4 */ block block_4: } What is going on here is that block 1 is in block0->dom_children, so the CSE pass looks into that, sees the store and invalidates the first SSBO load as I was initially hoping that it would. I guess this behavior is not expected then? Iago > > Now, that being said, I still think that we should definitely be doing > > this sort of thing in NIR now that we've finally added support for > > SSBO's and images. We've been trying to avoid adding new optimizations > > to GLSL, since we've been trying to move away from it. In addition, > > with SPIR-V on the way, anything added to GLSL IR now is something > > that we won't be able to use with SPIR-V shaders. Only doing it in FS > > doesn't sound so great either; we should be doing as much as possible > > at the midlevel, and combining SSBO loads is something that isn't > > FS-specific at all. > > Yeah, agreed. > > > There are two ways I can see support for this being added to NIR: > > > > 1. Add an extra fake source/destination to intrinsics with side > > effects, and add a pass to do essentially a conversion to SSA that > > wires up these "token" sources/destinations, or perhaps extend the > > existing to-SSA pass. > > > > 2. Add a special "load-combining" pass that does some dataflow > > analysis or similar (or, for now, only looks at things within a single > > block). > > > > The advantage of #1 is that we get to use existing NIR passes, like > > CSE, DCE, and GCM "for free" on SSBO loads and stores, without having > > to do the equivalent thing using dataflow analysis. Also, doing store > > forwarding (i.e. replacing the result of an SSBO load with the value > > corresponding to a store, if we can figure out which store affects it) > > is going to much easier. However, #1 is going to be much more of a > > research project. I've thought about how we could do it, but I'm still > > not sure how it could be done feasibly and still be correct. > > Thanks for sharing these ideas. #1 looks like the best way to go in > terms of benefits (although it looks rather artificial!), however I am > not sure that my understanding of NIR at this moment is good enough to > pursue something like that. Also, I would really like to see some sort > of support for this landing soon, even if limited, since having 16 SSBO > loads for a single matrix multiplication like the one I mention in the > commit log for my second patch is really bad, and even the simplest > version of #2 would address that, so I think I'll give #2 a try. > > Iago > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev