On Fri, Sep 16, 2016 at 5:59 PM, Francisco Jerez <curroje...@riseup.net> wrote:
> Jason Ekstrand <ja...@jlekstrand.net> writes: > > > On Sep 16, 2016 3:04 PM, "Francisco Jerez" <curroje...@riseup.net> > wrote: > >> > >> Not intended for upstream. Should cause a GPU hang if some thread is > >> executed with a non-contiguous dispatch mask breaking assumptions of > >> brw_stage_has_packed_dispatch(). Doesn't cause any CTS, DEQP or > >> Piglit regressions, while replacing brw_stage_has_packed_dispatch() > >> with a dummy implementation that unconditionally returns true on top > >> of this patch causes multiple GPU hangs. > >> --- > >> src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 17 +++++++++++++++++ > >> src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 +++++++++++++++++++++ > >> 2 files changed, 38 insertions(+) > >> > >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > >> index 042203d..b3eec49 100644 > >> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > >> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > >> @@ -33,6 +33,23 @@ using namespace brw::surface_access; > >> void > >> fs_visitor::emit_nir_code() > >> { > >> + if (brw_stage_has_packed_dispatch(stage, prog_data)) { > > > > Mind adding "0 &&" and merging this patch so we remain aware of the > issue, > > keep it building, and can easily test future hardware. > > > I guess that would work -- An alternative that would keep the > NIR-to-i965 pass tidier would be to assert(devinfo->gen <= 9) in > brw_stage_has_packed_dispatch() to make sure we don't forget to do a > full Piglit/DEQP/CTS run with this patch applied when a new generation > is powered on. I can also add a comment with a link to this patch. > Can we do both? Maybe something like this instead of an assert: if (devinfo->gen > 9) { static bool warned = false; if (!warned) { fprintf(stderr, "WARNING: VMask/DMask power-of-two assumptions need to be verified. Once verified using emit_fs_mask_pow2_check(), this warning may be disabled for gen%d", devinfo->gen); warned = true; } } where emit_fs_mask_pow2_check() is a helper that we put the check code into. That way it's not an assert-failure that will get instantly removed but something that will constantly bug the bring-up person until they have verified the assumption. > >> + const fs_builder ubld = bld.exec_all().group(1, 0); > >> + const fs_reg tmp = component(bld.vgrf(BRW_REGISTER_TYPE_UD), 0); > >> + const fs_reg mask = (stage == MESA_SHADER_FRAGMENT ? > > brw_vmask_reg() : > >> + brw_dmask_reg()); > >> + > >> + ubld.ADD(tmp, mask, brw_imm_ud(1)); > >> + ubld.AND(tmp, mask, tmp); > >> + > >> + /* This will loop forever if the dispatch mask doesn't have the > > expected > >> + * form '2^n-1', in which case tmp will be non-zero. > >> + */ > >> + bld.emit(BRW_OPCODE_DO); > >> + bld.CMP(bld.null_reg_ud(), tmp, brw_imm_ud(0), > BRW_CONDITIONAL_NZ); > >> + set_predicate(BRW_PREDICATE_NORMAL, bld.emit(BRW_OPCODE_WHILE)); > >> + } > >> + > >> /* emit the arrays used for inputs and outputs - load/store > > intrinsics will > >> * be converted to reads/writes of these arrays > >> */ > >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > >> index ba3bbdf..9f7a1f0 100644 > >> --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > >> @@ -35,6 +35,27 @@ namespace brw { > >> void > >> vec4_visitor::emit_nir_code() > >> { > >> + if (brw_stage_has_packed_dispatch(stage, &prog_data->base)) { > >> + const dst_reg tmp = writemask(dst_reg(this, > glsl_type::uint_type), > >> + WRITEMASK_X); > >> + const src_reg mask = > >> + > > brw_swizzle(retype(stride(brw_vec4_reg(BRW_ARCHITECTURE_REGISTER_FILE, > > BRW_ARF_STATE, 0), > >> + 0, 4, 1), > >> + BRW_REGISTER_TYPE_UD), > >> + BRW_SWIZZLE_ZZZZ); > > > > Can we just do vec4_reg(brw_vmask_reg)? > > > That didn't seem to work, because both brw_dmask_reg() and > brw_vmask_reg() return a register region that is equivalent to > sr0.0.xxxx in Align16 addressing mode (maybe a bug of PATCH 1? It's > unlikely to matter a lot in practice though), so the Z component where > the DMask is stored in is not even part of the returned Align16 region. > Right. As I said in another patch, I'm not terribly concerned about vec4. We've shown it works and we won't be bringing up more vec4. I'm fine with just merging the fs version. > >> + > >> + emit(ADD(tmp, mask, brw_imm_ud(1))); > >> + emit(AND(tmp, mask, src_reg(tmp))); > >> + > >> + /* This will loop forever if the dispatch mask doesn't have the > > expected > >> + * form '2^n-1', in which case tmp will be non-zero. > >> + */ > >> + emit(BRW_OPCODE_DO); > >> + emit(CMP(dst_null_ud(), src_reg(tmp), brw_imm_ud(0), > >> + BRW_CONDITIONAL_NZ)); > >> + emit(BRW_OPCODE_WHILE)->predicate = BRW_PREDICATE_NORMAL; > >> + } > >> + > >> if (nir->num_uniforms > 0) > >> nir_setup_uniforms(); > >> > >> -- > >> 2.9.0 > >> >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev