[AMD Official Use Only - AMD Internal Distribution Only]

Might consider leverage is_RMA flag for the same purpose?

Regards,
Hawking

-----Original Message-----
From: amd-gfx <[email protected]> On Behalf Of Tao Zhou
Sent: Wednesday, July 31, 2024 18:05
To: [email protected]
Cc: Zhou1, Tao <[email protected]>
Subject: [PATCH] drm/amdgpu: report bad status in GPU recovery

Instead of printing GPU reset failed.

Signed-off-by: Tao Zhou <[email protected]>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 355c2478c4b6..b7c967779b4b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5933,8 +5933,13 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
                tmp_adev->asic_reset_res = 0;

                if (r) {
-                       /* bad news, how to tell it to userspace ? */
-                       dev_info(tmp_adev->dev, "GPU reset(%d) failed\n", 
atomic_read(&tmp_adev->gpu_reset_counter));
+                       /* bad news, how to tell it to userspace ?
+                        * for ras error, we should report GPU bad status 
instead of
+                        * reset failure
+                        */
+                       if (!amdgpu_ras_eeprom_check_err_threshold(tmp_adev))
+                               dev_info(tmp_adev->dev, "GPU reset(%d) 
failed\n",
+                                       
atomic_read(&tmp_adev->gpu_reset_counter));
                        amdgpu_vf_error_put(tmp_adev, 
AMDGIM_ERROR_VF_GPU_RESET_FAIL, 0, r);
                } else {
                        dev_info(tmp_adev->dev, "GPU reset(%d) succeeded!\n", 
atomic_read(&tmp_adev->gpu_reset_counter));
--
2.34.1

Reply via email to