ELK seems to very picky about the preconditions to reset.
Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does
not like if reset occurs when there is active ring.

Ville found out that there is workaround with name
'WaMediaResetMainRingCleanup' which suggests that we need to
cleanup rings before resetting. It is unclear what cleanup
exactly means but evidence shows that stopping the ring
does have an effect on reset reliability. This patch makes
reset succesful on hangs induced by chained batches (the igt ones).
Note that if the hang is inside a shader, it is possible
that our attempts to stop the ring achieves anything.

v2: zero ctl,head,tail also. bug ref. use driver debugs (Chris)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100942
Testcase: igt/gem_busy/*-hang
Testcase: igt/gem_ringfill/hang-*
Suggested-by: Ville Syrjälä <[email protected]>
Cc: Ville Syrjälä <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Tomi Sarvela <[email protected]>
Signed-off-by: Mika Kuoppala <[email protected]>
---
 drivers/gpu/drm/i915/intel_uncore.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
b/drivers/gpu/drm/i915/intel_uncore.c
index 7eaaf2225e1a..43da84be0321 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -1427,6 +1427,35 @@ int i915_reg_read_ioctl(struct drm_device *dev,
        return ret;
 }
 
+static void gen3_stop_rings(struct drm_i915_private *dev_priv)
+{
+       struct intel_engine_cs *engine;
+       enum intel_engine_id id;
+
+       for_each_engine(engine, dev_priv, id) {
+               const u32 base = engine->mmio_base;
+               const i915_reg_t mode = RING_MI_MODE(base);
+
+               I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+               if (intel_wait_for_register_fw(dev_priv,
+                                              mode,
+                                              MODE_IDLE,
+                                              MODE_IDLE,
+                                              500))
+                       DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
+                                        engine->name);
+
+               I915_WRITE_FW(RING_CTL(base), 0);
+               I915_WRITE_FW(RING_HEAD(base), 0);
+               I915_WRITE_FW(RING_TAIL(base), 0);
+
+               /* Check acts as a post */
+               if (I915_READ_FW(RING_HEAD(base)) != 0)
+                       DRM_DEBUG_DRIVER("%s: ring head not parked\n",
+                                        engine->name);
+       }
+}
+
 static bool i915_reset_complete(struct pci_dev *pdev)
 {
        u8 gdrst;
@@ -1472,6 +1501,12 @@ static int g4x_do_reset(struct drm_i915_private 
*dev_priv, unsigned engine_mask)
        I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) | 
VCP_UNIT_CLOCK_GATE_DISABLE);
        POSTING_READ(VDECCLK_GATE_D);
 
+       /* We stop engines, otherwise we might get failed reset and a
+        * dead gpu (on elk).
+        */
+       /* WaMediaResetMainRingCleanup:ctg,elk (supposedly) */
+       gen3_stop_rings(dev_priv);
+
        pci_write_config_byte(pdev, I915_GDRST,
                              GRDOM_MEDIA | GRDOM_RESET_ENABLE);
        ret =  wait_for(g4x_reset_complete(pdev), 500);
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to