Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [High] Race condition between concurrent engines due to per-context toggling of shared global MCR register `LSC_CHICKEN_BIT_0`. - [High] Invalid cross-domain hardware programming by unconditionally emitting Render/Compute workaround on Video (VCS) and Copy (BCS) engines. --
commit 77b4f585118e5a808b879e93ddf1a2f08c7c13c1 Author: Jia Yao <[email protected]> Subject: drm/i915/dg2: Add per-context control for Wa_22013059131 This commit introduces a per-context control for Wa_22013059131, which toggles FORCE_1_SUB_MESSAGE_PER_FRAGMENT in LSC_CHICKEN_BIT_0 at engine init. It adds I915_CONTEXT_PARAM_WA_22013059131 to allow userspace to manage the workaround itself, skipping the bit 15 programming. LSC_CHICKEN_BIT_0 is dynamically rewritten on every context switch to avoid leaking state between contexts. Link: https://github.com/intel/compute-runtime/pull/919 > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c > b/drivers/gpu/drm/i915/gt/intel_lrc.c > index 147d22907960b..13344ebb847ec 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c [ ... ] > @@ -1348,6 +1348,37 @@ gen12_invalidate_state_cache(u32 *cs) > return cs; > } > > +static u32 * > +dg2_g11_emit_wa_22013059131(const struct intel_context *ce, u32 *cs) > +{ [ ... ] > + if (rcu_access_pointer(ce->gem_context) && > + !test_bit(CONTEXT_WA_22013059131, &ce->flags)) { > + val |= FORCE_1_SUB_MESSAGE_PER_FRAGMENT; > + } > + > + *cs++ = MI_LOAD_REGISTER_IMM(1); > + *cs++ = i915_mmio_reg_offset(LSC_CHICKEN_BIT_0); > + *cs++ = val; [Severity: High] Does dynamically overwriting LSC_CHICKEN_BIT_0 using MI_LOAD_REGISTER_IMM during an engine-local context switch create a race condition? Because LSC_CHICKEN_BIT_0 is a shared Multicast (MCR) register across the DSS, concurrent engines (like RCS and CCS) execute independently. Would a context switch on one engine clobber the workaround state of another context actively executing on a concurrent engine, leading to GPU hangs or workload corruption? > + > + return cs; > +} > + [ ... ] > @@ -1387,7 +1422,13 @@ gen12_emit_indirect_ctx_xcs(const struct intel_context > *ce, u32 *cs) > > PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE, > 0); > > - return gen12_emit_aux_table_inv(ce->engine, cs); > + cs = gen12_emit_aux_table_inv(ce->engine, cs); > + > + /* Wa_22013059131:dg2 */ > + if (IS_DG2_G11(ce->engine->i915)) > + cs = dg2_g11_emit_wa_22013059131(ce, cs); [Severity: High] Does gen12_emit_indirect_ctx_xcs() incorrectly emit a Render/Compute register write on Video (VCS) and Copy (BCS) engines? Since gen12_emit_indirect_ctx_xcs() is called to build the indirect context batch buffer for all non-Render engines, this unconditionally emits the MI_LOAD_REGISTER_IMM to LSC_CHICKEN_BIT_0. As LSC (Load/Store Cache) is part of the Xe-Core (Render/Compute domain), Video and Blitter engines don't have access to this register domain. Will this invalid cross-domain hardware programming cause command exceptions, dropped writes, or command streamer hangs? Would it be better to add an engine class check here, such as (ce->engine->class == COMPUTE_CLASS)? > + > + return cs; > } -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
