[AMD Official Use Only - AMD Internal Distribution Only]

Ping ...

> -----Original Message-----
> From: Jesse.Zhang <[email protected]>
> Sent: Tuesday, September 16, 2025 4:08 PM
> To: [email protected]
> Cc: Deucher, Alexander <[email protected]>; Koenig, Christian
> <[email protected]>; Lazar, Lijo <[email protected]>; Zhang,
> Jesse(Jie) <[email protected]>
> Subject: [PATCH v3] drm/amdgpu: Add fallback to pipe reset if KCQ ring reset 
> fails
>
> From: Lijo Lazar <[email protected]>
>
> Add a fallback mechanism to attempt pipe reset when KCQ reset fails to recover
> the ring. After performing the KCQ reset and queue remapping, test the ring
> functionality. If the ring test fails, initiate a pipe reset as an additional 
> recovery step.
>
> v2: fix the typo (Lijo)
> v3: try pipeline reset when kiq mapping fails (Lijo)
>
> Signed-off-by: Lijo Lazar <[email protected]>
> Signed-off-by: Jesse Zhang <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> index 8ba66d4dfe86..77f9d5b9a556 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> @@ -3560,6 +3560,7 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring
> *ring,
>       struct amdgpu_device *adev = ring->adev;
>       struct amdgpu_kiq *kiq = &adev->gfx.kiq[ring->xcc_id];
>       struct amdgpu_ring *kiq_ring = &kiq->ring;
> +     int reset_mode = AMDGPU_RESET_TYPE_PER_QUEUE;
>       unsigned long flags;
>       int r;
>
> @@ -3597,6 +3598,7 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring
> *ring,
>               if (!(adev->gfx.compute_supported_reset &
> AMDGPU_RESET_TYPE_PER_PIPE))
>                       return -EOPNOTSUPP;
>               r = gfx_v9_4_3_reset_hw_pipe(ring);
> +             reset_mode = AMDGPU_RESET_TYPE_PER_PIPE;
>               dev_info(adev->dev, "ring: %s pipe reset :%s\n", ring->name,
>                               r ? "failed" : "successfully");
>               if (r)
> @@ -3619,10 +3621,20 @@ static int gfx_v9_4_3_reset_kcq(struct amdgpu_ring
> *ring,
>       r = amdgpu_ring_test_ring(kiq_ring);
>       spin_unlock_irqrestore(&kiq->ring_lock, flags);
>       if (r) {
> +             if (reset_mode == AMDGPU_RESET_TYPE_PER_QUEUE)
> +                     goto pipe_reset;
> +
>               dev_err(adev->dev, "fail to remap queue\n");
>               return r;
>       }
>
> +     if (reset_mode == AMDGPU_RESET_TYPE_PER_QUEUE) {
> +             r = amdgpu_ring_test_ring(ring);
> +             if (r)
> +                     goto pipe_reset;
> +     }
> +
> +
>       return amdgpu_ring_reset_helper_end(ring, timedout_fence);  }
>
> --
> 2.49.0

Reply via email to