On Fri, Mar 23, 2018 at 5:18 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > On Fri, Mar 23, 2018 at 2:15 PM, Karol Herbst <kher...@redhat.com> wrote: >> >> On Fri, Mar 23, 2018 at 10:07 PM, Jason Ekstrand <ja...@jlekstrand.net> >> wrote: >> > +list >> > >> > On Fri, Mar 23, 2018 at 1:45 PM, Karol Herbst <kher...@redhat.com> >> > wrote: >> >> >> >> On Fri, Mar 23, 2018 at 9:30 PM, Jason Ekstrand <ja...@jlekstrand.net> >> >> wrote: >> >> > As I've been rewriting core NIR deref handling, I've been thinking >> >> > about >> >> > this problem quite a bit. One objective I have is to actually make >> >> > UBO >> >> > and >> >> > SSBO access go through derefs instead of just being an offset and >> >> > index >> >> > so >> >> > that the compiler can better reason about them. In particular, I >> >> > want >> >> > to be >> >> > able to start doing load/store elimination on SSBOs, SLM, and >> >> > whatever >> >> > CL >> >> > has which would be great for everyone's compute performance (GL, >> >> > Vulkan, >> >> > CL, >> >> > etc.). >> >> > >> >> > I would be lying if I said I had a full plan but I do have part of a >> >> > plan. >> >> > In my patch which adds the deref instructions, I add a new "cast" >> >> > deref >> >> > type >> >> > which takes an arbitrary value as it's source and kicks out a deref >> >> > with >> >> > a >> >> > type. Whenever we discover that the source of the cast is actually >> >> > another >> >> > deref which is compatible (same type etc.), copy propagation gets rid >> >> > of >> >> > the >> >> > cast for you. The idea is that, instead of doing a >> >> > load_raw(raw_ptr), >> >> > you >> >> > would do a load((type *)raw_ptr). >> >> > >> >> > Right now, most of the core NIR optimizations will throw a fit if >> >> > they >> >> > ever >> >> > see a cast. This is intentional because it requires us to manually >> >> > go >> >> > through and handle casts. This would mean that, at the moment, you >> >> > would >> >> > have to lower to load_raw intrinsics almost immediately after coming >> >> > out >> >> > of >> >> > SPIR-V. >> >> > >> >> >> >> Well it gets more fun with OpenCL 2.0 where you can have generic >> >> pointer where you only know the type at creation type. You can also >> >> declare generic pointers as function inputs in a way, that you never >> >> actually know from where you have to load if you only have that one >> >> function. So the actual load operation depends on when you create the >> >> initial pointer variable (you can cast from X to generic, but not the >> >> other way around). >> >> >> >> Which in the end means you can end up with load(generic_ptr) and only >> >> following the chain up to it's creation (with function inlining in >> >> mind) you know the actual memory target. >> > >> > >> > Yup. And there will always be crazy cases where you can't actually >> > follow >> > it and you have to emit a pile of code to load different ways depending >> > on >> > some bits somewhere that tell you how to load it. I'm well aware of the >> > insanity. :-) This is part of the reason why I'm glad I'm not trying to >> > write an OpenCL 2.0 driver. >> > >> > This insanity is exactly why I'm suggesting the pointer casting. Sure, >> > you >> > may not know the data type until the actual load. In that case, you end >> > up >> > with the cast being right before the load. If you don't know the >> > storage >> > class, maybe you have to switch and do multiple casts based on some >> > bits. >> > Alternatively, if you don't know the storage class, we can just let the >> > deref mode be 0 for "I don't know". or maybe multiple bits for "these >> > are >> > the things it might be". In any case, I think we can handle it. >> > >> >> there shouldn't be a situation where we don't know, except when you >> don't inline all functions. I think Rob had the idea of fat pointers >> where a pointer is a vec2 and the 2nd component contains the actual >> pointer type and you end up with a switch over the type to get the >> correct storage class. And if the compiler inlines all functions, it >> should be able to optimize that switch away. > > > Right. Today, we live in a world where all functions are inlined. Sadly, I > fear that world may come to and end one of these days. :( >
fwiw, so far I'm mostly caring about the inline-all-the-fxns case.. for the cases where we don't know what sort of pointer we have, Karol (iirc?) suggested name-mangling functions, which seems semi-sane.. but I've mostly tried to ignore that for now until we have more basic things working. Possibly we need a compiler option to lower everything to load/store_global (or maybe "raw" is a better name?) for hw that can remap local memory into a single address space and use the same load/store instructions. I think that should be at least enough to move forward with nv hw + fxn calls. Less so for intel/adreno but from my PoV I'm willing to solve that problem later. BR, -R _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev