Sometimes Icelake forgets to reset the CSB pointers on a GPU reset,
leading to carry on updating the old tail of the buffer.
<0>[ 618.138490] i915_sel-5636 3d..1 673425465us : trace_ports: vecs0:
submit { 14de2:504, 0:0 }
<0>[ 618.138490] i915_sel-5636 3.... 673425493us : intel_engine_reset:
vecs0 flags=100
<0>[ 618.138490] i915_sel-5636 3.... 673425493us : execlists_reset_prepare:
vecs0: depth<-0
<0>[ 618.138490] i915_sel-5636 3.... 673425493us : intel_engine_stop_cs:
vecs0
<0>[ 618.138490] i915_sel-5636 3.... 673425523us : __intel_gt_reset:
engine_mask=40
<0>[ 618.138490] i915_sel-5636 3.... 673425568us : execlists_reset: vecs0
<0>[ 618.138490] i915_sel-5636 3d..1 673425568us : process_csb: vecs0
cs-irq head=1, tail=2
<0>[ 618.138490] i915_sel-5636 3d..1 673425568us : process_csb: vecs0
csb[2]: status=0x00000001:0x40000000
<0>[ 618.138490] i915_sel-5636 3d..1 673425569us : trace_ports: vecs0:
promote { 14de2:504*, 0:0 }
<0>[ 618.138490] i915_sel-5636 3d..1 673425570us : __i915_request_reset:
vecs0 rq=14de2:504, guilty? yes
<0>[ 618.138490] i915_sel-5636 3d..1 673425571us : __execlists_reset: vecs0
replay {head:2de0, tail:2e48}
<0>[ 618.138490] i915_sel-5636 3d..1 673425572us : __i915_request_unsubmit:
vecs0 fence 14de2:504, current 503
<0>[ 618.138490] i915_sel-5636 3.... 673435544us :
intel_engine_cancel_stop_cs: vecs0
<0>[ 618.138490] i915_sel-5636 3.... 673435544us : process_csb: vecs0
cs-irq head=11, tail=11
<0>[ 618.138490] i915_sel-5636 3d..1 673435545us : __i915_request_submit:
vecs0 fence 14de2:504, current 503
<0>[ 618.138490] i915_sel-5636 3d..1 673435546us :
__execlists_submission_tasklet: vecs0: queue_priority_hint:-2147483648,
submit:yes
<0>[ 618.138490] i915_sel-5636 3d..1 673435548us : trace_ports: vecs0:
submit { 14de2:504*, 0:0 }
<0>[ 618.138490] i915_sel-5636 3.... 673435549us : execlists_reset_finish:
vecs0: depth->0
<0>[ 618.138490] ksoftirq-21 2..s. 673435592us : process_csb: vecs0
cs-irq head=11, tail=3
<0>[ 618.138490] ksoftirq-21 2..s. 673435593us : process_csb: vecs0
csb[0]: status=0x00000001:0x40000000
<0>[ 618.138490] ksoftirq-21 2..s. 673435594us : trace_ports: vecs0:
promote { 14de2:504*, 0:0 }
<0>[ 618.138490] ksoftirq-21 2..s. 673435596us : process_csb: vecs0
csb[1]: status=0x00000018:0x40000040
<0>[ 618.138490] ksoftirq-21 2..s. 673435597us : trace_ports: vecs0:
completed { 14de2:504*, 0:0 }
<0>[ 618.138490] ksoftirq-21 2..s. 673435612us : process_csb:
process_csb:2188 GEM_BUG_ON(!i915_request_completed(*execlists->active) &&
!reset_in_progress(execlists))
After the reset, we do another clflush before checking the CSB to be
sure we see whatever was left in the CSB prior to the reset. So it is
unlikely to be an incoherent view of the CSB, and more likely that
Icelake didn't reset its pointers.
References: 582a6f90aa0d ("drm/i915/execlists: Add a paranoid flush of the CSB
pointers upon reset")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Mika Kuoppala <[email protected]>
---
drivers/gpu/drm/i915/gt/intel_lrc.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 8d79a965f341..b869a9c7b3c6 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2996,6 +2996,14 @@ static void reset_csb_pointers(struct intel_engine_cs
*engine)
WRITE_ONCE(*execlists->csb_write, reset_value);
wmb(); /* Make sure this is visible to HW (paranoia?) */
+ /*
+ * Sometimes Icelakes forgets to reset its pointers on a GPU reset.
+ * Bludgeon them with a mmio update to be sure.
+ */
+ ENGINE_WRITE(engine, RING_CONTEXT_STATUS_PTR,
+ reset_value << 8 | reset_value);
+ ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR);
+
invalidate_csb_entries(&execlists->csb_status[0],
&execlists->csb_status[reset_value]);
}
--
2.24.0.rc2
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx