On Mon, Dec 14, 2015 at 11:28:39AM +0000, Dave Gordon wrote:
> On 11/12/15 22:59, Chris Wilson wrote:
> >Remove some redundant kernel messages as we deduce a hung GPU and
> >capture the error state.
> >
> >Signed-off-by: Chris Wilson <[email protected]>
> >---
> >  drivers/gpu/drm/i915/i915_irq.c | 16 ++++++----------
> >  1 file changed, 6 insertions(+), 10 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> >b/drivers/gpu/drm/i915/i915_irq.c
> >index 4cfbd694b3a8..365d4872a67d 100644
> >--- a/drivers/gpu/drm/i915/i915_irq.c
> >+++ b/drivers/gpu/drm/i915/i915_irq.c
> >@@ -2963,7 +2963,7 @@ static void i915_hangcheck_elapsed(struct work_struct 
> >*work)
> >     struct drm_device *dev = dev_priv->dev;
> >     struct intel_engine_cs *ring;
> >     int i;
> >-    int busy_count = 0, rings_hung = 0;
> >+    int busy_count = 0;
> >     bool stuck[I915_NUM_RINGS] = { 0 };
> >  #define BUSY 1
> >  #define KICK 5
> >@@ -3057,17 +3057,13 @@ static void i915_hangcheck_elapsed(struct 
> >work_struct *work)
> >     }
> >
> >     for_each_ring(ring, dev_priv, i) {
> >-            if (ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG) {
> >-                    DRM_INFO("%s on %s\n",
> >-                             stuck[i] ? "stuck" : "no progress",
> >-                             ring->name);
> >-                    rings_hung++;
> >-            }
> >+            if (ring->hangcheck.score >= HANGCHECK_SCORE_RING_HUNG)
> >+                    return i915_handle_error(dev, true,
> >+                                             "%s on %s",
> >+                                             stuck[i] ? "No progress" : 
> >"Hang",
> >+                                             ring->name);
> >     }
> >
> >-    if (rings_hung)
> >-            return i915_handle_error(dev, true, "Ring hung");
> >-
> >     /* Reset timer in case GPU hangs without another request being added */
> >     if (busy_count)
> >             i915_queue_hangcheck(dev_priv);
> 
> This version provides less information (in dmesg & syslog) in the
> case that multiple rings have (been detected as) hung. Does this
> ever happen?

Not often. And intended, since that information is already in the error
state.

> Is the original more useful for finding bugs in
> hangcheck itself?

No. See i915_hangcheck_info.

> If this is a test that has disabled error state
> capture (because it's *trying* to hang one or more rings) can we
> still know how many rings have been diagnosed as hung?

If you have a use case, then you can look at the interface you require.
i915_hangcheck_info should be sufficient in most cases, or at least a
good starting point. But you may want a more specific debugfs to avoid
parsing.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to