On Mon, Sep 10, 2018 at 5:33 PM Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl> wrote:
> On Mon, Sep 10, 2018 at 8:05 PM Jason Ekstrand <ja...@jlekstrand.net> > wrote: > > > > The instruction scheduler is re-ordering loads which is causing fence > > values to be loaded after the value they're fencing. In particular, > > consider the following pseudocode: > > > > void try_use_a_thing(int idx) > > { > > bool ready = ssbo.arr[idx].ready; > > vec4 data = ssbo.arr[idx].data; > > if (ready) > > use(data); > > } > > > > void write_a_thing(int idx, vec4 data) > > { > > ssbo.arr[idx].data = data; > > ssbo.arr[idx].ready = true; > > } > > > > Our current instruction scheduling scheme doesn't see any problem with > > re-ordering the load of "ready" with respect to the load of "data". > > However, if try_use_a_thing is called in one thread and write_a_thing is > > called in another thread, such re-ordering could cause invalid data to > > be used. Normally, some re-ordering of loads is fine; however, with the > > Vulkan memory model, there are some additional guarantees that are > > provided particularly in the case of atomic loads which we currently > > don't differentiate in any way in the back-end. > > > > Obviously, we need to come up with something better for this than just > > shutting off the scheduler but I wanted to send this out earlier rather > > than later and provide the opportunity for a discussion. > > so how about adding a bitmask with flags for these to load/store > intrinsics? I remember we still need a coherent bit to implement > coherent loads and stores for AMD. We could easily have a mask with > coherent+atomic+whatever and then pass that to the backend? > I think that's where we're going to end up. The current model you just have load/store ops and then barriers to block around them. However, I suspect that the long-term direction is going to be something more like SPIR-V's access flags on each load/store that provides some sort of partial barrier and re-ordering restriction information. > --- > > src/intel/compiler/brw_fs.cpp | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/src/intel/compiler/brw_fs.cpp > b/src/intel/compiler/brw_fs.cpp > > index 3f7f2b4c984..9df238a6f6a 100644 > > --- a/src/intel/compiler/brw_fs.cpp > > +++ b/src/intel/compiler/brw_fs.cpp > > @@ -6427,7 +6427,7 @@ fs_visitor::allocate_registers(unsigned > min_dispatch_width, bool allow_spilling) > > * performance but increasing likelihood of allocating. > > */ > > for (unsigned i = 0; i < ARRAY_SIZE(pre_modes); i++) { > > - schedule_instructions(pre_modes[i]); > > + //schedule_instructions(pre_modes[i]); > > > > if (0) { > > assign_regs_trivial(); > > @@ -6478,7 +6478,7 @@ fs_visitor::allocate_registers(unsigned > min_dispatch_width, bool allow_spilling) > > > > opt_bank_conflicts(); > > > > - schedule_instructions(SCHEDULE_POST); > > + //schedule_instructions(SCHEDULE_POST); > > > > if (last_scratch > 0) { > > MAYBE_UNUSED unsigned max_scratch_size = 2 * 1024 * 1024; > > -- > > 2.17.1 > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev