Chris Wilson <ch...@chris-wilson.co.uk> writes:

> If we have a long period of idleness, we turn off the hangcheck timer
> and stop polling the hardware. Before we restart the hangcheck, we
> should clear the previous timestamps to prevent us thinking that the
> engine was stalled for a long time, if the seqno were manipulated
> carefully (such as the repeating patterns in gem_exec_whisper).
>
> It should have no impact upon normal use.
>
> Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuopp...@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_hangcheck.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c 
> b/drivers/gpu/drm/i915/intel_hangcheck.c
> index b0ca0c4c70d9..a74decca5109 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -409,13 +409,13 @@ static void i915_hangcheck_elapsed(struct work_struct 
> *work)
>       int busy_count = 0;
>  
>       if (!i915.enable_hangcheck)
> -             return;
> +             goto disarm_hangcheck;
>  
>       if (!READ_ONCE(dev_priv->gt.awake))
> -             return;
> +             goto disarm_hangcheck;
>  
>       if (i915_terminally_wedged(&dev_priv->gpu_error))
> -             return;
> +             goto disarm_hangcheck;
>  
>       /* As enabling the GPU requires fairly extensive mmio access,
>        * periodically arm the mmio checker to see if we are triggering
> @@ -446,8 +446,14 @@ static void i915_hangcheck_elapsed(struct work_struct 
> *work)
>               hangcheck_declare_hang(dev_priv, hung, stuck);
>  
>       /* Reset timer in case GPU hangs without another request being added */
> -     if (busy_count)
> +     if (busy_count) {
>               i915_queue_hangcheck(dev_priv);

Now if we don't have a waiter, we always init hangcheck. And thus
we never detect a hang if so. As demonstrated by the
gem_busy@basic-default-hang.

I suggest we decouple the waiters completely from hangcheck:

-               const bool busy = intel_engine_has_waiter(engine);
+               const bool busy = engine->timeline->inflight_seqnos;

-Mika

> +             return;
> +     }
> +
> +disarm_hangcheck:
> +     for_each_engine(engine, dev_priv, id)
> +             intel_engine_init_hangcheck(engine);
>  }
>  
>  void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
> -- 
> 2.11.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to