On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote: > Tested on Ivybridge, Haswell and Broadwell. > > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > --- > src/mesa/drivers/dri/i965/brw_compute.c | 39 > ++++++++++++++++++++++++++++++++- > src/mesa/drivers/dri/i965/brw_defines.h | 1 + > 2 files changed, 39 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_compute.c > b/src/mesa/drivers/dri/i965/brw_compute.c > index baed701..06ef448 100644 > --- a/src/mesa/drivers/dri/i965/brw_compute.c > +++ b/src/mesa/drivers/dri/i965/brw_compute.c > @@ -31,12 +31,49 @@ > #include "brw_draw.h" > #include "brw_state.h" > #include "intel_batchbuffer.h" > +#include "brw_defines.h" > > > static void > brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups) > { > - _mesa_problem(&brw->ctx, "TODO: implement brw_emit_gpgpu_walker"); > + const struct brw_cs_prog_data *prog_data = brw->cs.prog_data; > + > + const unsigned simd_size = prog_data->simd_size; > + unsigned group_size = prog_data->local_size[0] * > + prog_data->local_size[1] * prog_data->local_size[2]; > + unsigned thread_width_max = > + (group_size + simd_size - 1) / simd_size; > + > + uint32_t right_mask = (1u << simd_size) - 1; > + const unsigned right_non_aligned = group_size & (simd_size - 1); > + if (right_non_aligned != 0) > + right_mask >>= (simd_size - right_non_aligned);
I think this is equvalent to: uint32_t right_mask = (1u << (simd_size - (group_size % simd_size))) - 1; which might be a bit simpler... > + BEGIN_BATCH(dwords); > + OUT_BATCH(GPGPU_WALKER << 16 | (dwords - 2)); I was going to suggest splitting this into separate Gen8+ and Gen7 blocks, but now that I look at the code...these two are slightly different indirect handling, and the later one is just a DWord of MBZ, so...it's not really that different. I think what you have is fine :) > + uint32_t dwords = brw->gen < 8 ? 11 : 15; > + OUT_BATCH(0); > + if (brw->gen >= 8) { > + OUT_BATCH(0); > + OUT_BATCH(0); > + } > + assert(thread_width_max <= brw->max_cs_threads); > + OUT_BATCH(((simd_size == 8) ? 0 : 1) << 30 | You might want to write this as ((simd_size / 8) - 1). That will work for SIMD8/16/32. Topi would probably suggest using SET_FIELD, i.e. #define BRW_GPGPU_SIMD_SIZE_SHIFT 30 #define BRW_GPGPU_SIMD_SIZE_MASK INTEL_MASK(31, 30) SET_FIELD((simd_size / 8) - 1, BRW_GPGPU_SIMD_SIZE) It's probably a good idea here too. > + (thread_width_max - 1)); Don't you need to set the thread height/depth maximums as well? I'm not really sure how this works. > + OUT_BATCH(0); It'd be nice to label the 0's, i.e. OUT_BATCH(0); /* Thread Group ID Starting X */ OUT_BATCH(num_groups[0]); /* Thread Group ID X Dimension */ With those changes, the whole series is: Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> I haven't verified that these execution masks are really what you want. You know more about this than I do. :) > + if (brw->gen >= 8) > + OUT_BATCH(0); > + OUT_BATCH(num_groups[0]); > + OUT_BATCH(0); > + if (brw->gen >= 8) > + OUT_BATCH(0); > + OUT_BATCH(num_groups[1]); > + OUT_BATCH(0); > + OUT_BATCH(num_groups[2]); > + OUT_BATCH(right_mask); > + OUT_BATCH(0xffffffff); > + ADVANCE_BATCH(); > } > > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index 36f46af..cd25511 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > @@ -2451,5 +2451,6 @@ enum brw_wm_barycentric_interp_mode { > > #define MEDIA_VFE_STATE 0x7000 > #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002 > +#define GPGPU_WALKER 0x7105 > > #endif >
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev