Chris Wilson <[email protected]> writes:

> As we declare the GPU wedged if the reset fails, such a failure is quite
> terminal. Before taking that drastic action, let's sleep first and try
> active, in the hope that the hardware has quietened down and is then
> able to reset. After a few such attempts, it is fair to say that the HW
> is truly wedged.
>
> v2: Always print the failure message now, we precheck whether resets are
> disabled.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=104007
> Signed-off-by: Chris Wilson <[email protected]>
> Cc: Mika Kuoppala <[email protected]>
> Cc: Joonas Lahtinen <[email protected]>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index e0f053f9c186..7faf20aff25a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1877,7 +1877,9 @@ void i915_reset(struct drm_i915_private *i915, unsigned 
> int flags)
>  {
>       struct i915_gpu_error *error = &i915->gpu_error;
>       int ret;
> +     int i;
>  
> +     might_sleep();
>       lockdep_assert_held(&i915->drm.struct_mutex);
>       GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags));
>  
> @@ -1900,12 +1902,20 @@ void i915_reset(struct drm_i915_private *i915, 
> unsigned int flags)
>               goto error;
>       }
>  
> -     ret = intel_gpu_reset(i915, ALL_ENGINES);
> +     if (!intel_has_gpu_reset(i915)) {
> +             DRM_DEBUG_DRIVER("GPU reset disabled\n");
> +             goto error;
> +     }
> +
> +     for (i = 0; i < 3; i++) {
> +             ret = intel_gpu_reset(i915, ALL_ENGINES);
> +             if (ret == 0)
> +                     break;
> +
> +             msleep(100);

Seems reasonable to try few times and pause between defibrillate
attempts instead of throwing dirt on top of coffin right
off the bat.

Also I have been pondering that should we add a minicheck
to intel_gpu_reset to poke that the gpu is really there.
Like doing few nops in (render)ringbuffer and see if head
moves before declaring it as a reset success?

Not that we would not see it in init right after but just
to have more precise location of failure instead of
initing a dead gpu.

Reviewed-by: Mika Kuoppala <[email protected]>

-Mika


> +     }
>       if (ret) {
> -             if (ret != -ENODEV)
> -                     DRM_ERROR("Failed to reset chip: %i\n", ret);
> -             else
> -                     DRM_DEBUG_DRIVER("GPU reset disabled\n");
> +             dev_err(i915->drm.dev, "Failed to reset chip\n");
>               goto error;
>       }
>  
> -- 
> 2.15.1
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to