Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:
- [High] Race condition between concurrent engines due to per-context toggling 
of shared global MCR register `LSC_CHICKEN_BIT_0`.
- [High] Invalid cross-domain hardware programming by unconditionally emitting 
Render/Compute workaround on Video (VCS) and Copy (BCS) engines.
--

commit 77b4f585118e5a808b879e93ddf1a2f08c7c13c1
Author: Jia Yao <[email protected]>
Subject: drm/i915/dg2: Add per-context control for Wa_22013059131

This commit introduces a per-context control for Wa_22013059131, which
toggles FORCE_1_SUB_MESSAGE_PER_FRAGMENT in LSC_CHICKEN_BIT_0 at engine
init. It adds I915_CONTEXT_PARAM_WA_22013059131 to allow userspace to
manage the workaround itself, skipping the bit 15 programming.
LSC_CHICKEN_BIT_0 is dynamically rewritten on every context switch to
avoid leaking state between contexts.

Link: https://github.com/intel/compute-runtime/pull/919

> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 147d22907960b..13344ebb847ec 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
[ ... ]
> @@ -1348,6 +1348,37 @@ gen12_invalidate_state_cache(u32 *cs)
>       return cs;
>  }
>  
> +static u32 *
> +dg2_g11_emit_wa_22013059131(const struct intel_context *ce, u32 *cs)
> +{
[ ... ]
> +     if (rcu_access_pointer(ce->gem_context) &&
> +         !test_bit(CONTEXT_WA_22013059131, &ce->flags)) {
> +             val |= FORCE_1_SUB_MESSAGE_PER_FRAGMENT;
> +     }
> +
> +     *cs++ = MI_LOAD_REGISTER_IMM(1);
> +     *cs++ = i915_mmio_reg_offset(LSC_CHICKEN_BIT_0);
> +     *cs++ = val;

[Severity: High]
Does dynamically overwriting LSC_CHICKEN_BIT_0 using MI_LOAD_REGISTER_IMM
during an engine-local context switch create a race condition?

Because LSC_CHICKEN_BIT_0 is a shared Multicast (MCR) register across the DSS,
concurrent engines (like RCS and CCS) execute independently. Would a context
switch on one engine clobber the workaround state of another context actively
executing on a concurrent engine, leading to GPU hangs or workload corruption?

> +
> +     return cs;
> +}
> +
[ ... ]
> @@ -1387,7 +1422,13 @@ gen12_emit_indirect_ctx_xcs(const struct intel_context 
> *ce, u32 *cs)
>                                                   
> PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE,
>                                                   0);
>  
> -     return gen12_emit_aux_table_inv(ce->engine, cs);
> +     cs = gen12_emit_aux_table_inv(ce->engine, cs);
> +
> +     /* Wa_22013059131:dg2 */
> +     if (IS_DG2_G11(ce->engine->i915))
> +             cs = dg2_g11_emit_wa_22013059131(ce, cs);

[Severity: High]
Does gen12_emit_indirect_ctx_xcs() incorrectly emit a Render/Compute register
write on Video (VCS) and Copy (BCS) engines?

Since gen12_emit_indirect_ctx_xcs() is called to build the indirect context
batch buffer for all non-Render engines, this unconditionally emits the
MI_LOAD_REGISTER_IMM to LSC_CHICKEN_BIT_0.

As LSC (Load/Store Cache) is part of the Xe-Core (Render/Compute domain),
Video and Blitter engines don't have access to this register domain. Will this
invalid cross-domain hardware programming cause command exceptions, dropped
writes, or command streamer hangs?

Would it be better to add an engine class check here, such as
(ce->engine->class == COMPUTE_CLASS)?

> +
> +     return cs;
>  }

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to