My current theory is that masks interrupt delivery to the local CPU
during a critical phase. Purely papering over the symptoms with a delay
plucked out of thin air from testing on tgl1-gem.

Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuopp...@linux.intel.com>
Cc: Andi Shyti <andi.sh...@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index fa385218ce92..fe8f4625f04f 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1186,6 +1186,21 @@ static void execlists_submit_ports(struct 
intel_engine_cs *engine)
        /* we need to manually load the submit queue */
        if (execlists->ctrl_reg)
                writel(EL_CTRL_LOAD, execlists->ctrl_reg);
+
+       /*
+        * Now this is evil magic.
+        *
+        * Adding the same udelay() to process_csb before we clear
+        * execlists->pending (that is after we receive the HW ack for this
+        * submit and before we can submit again) does not relieve the symptoms
+        * (machine lockup). So is the active difference here the wait under
+        * the irq-off spinlock? That gives more credance to the theory that
+        * the issue is interrupt delivery. Also note that we still rely on
+        * disabling RPS, again that seems like an issue with simultaneous
+        * GT interrupts being delivered to the same CPU.
+        */
+       if (IS_TIGERLAKE(engine->i915))
+               udelay(250);
 }
 
 static bool ctx_single_port_submission(const struct intel_context *ce)
-- 
2.23.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to