[AMD Official Use Only]

Let's replace " RAS poison mode " with "poison is created, no user action is 
needed" other than that, the patch is

Reviewed-by: Hawking Zhang <hawking.zh...@amd.com>

Regards,
Hawking
-----Original Message-----
From: Zhou1, Tao <tao.zh...@amd.com> 
Sent: Wednesday, September 22, 2021 18:33
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <hawking.zh...@amd.com>; 
Clements, John <john.cleme...@amd.com>; Yang, Stanley <stanley.y...@amd.com>
Cc: Zhou1, Tao <tao.zh...@amd.com>
Subject: [PATCH 3/3] drm/amdgpu: skip umc ras irq handling in poison mode (v2)

In ras poison mode, umc uncorrectable error will be ignored until the corrupted 
data consumed by another ras module (such as gfx, sdma).

v2: simplify the debug message and replace dev_warn with dev_info.

Signed-off-by: Tao Zhou <tao.zh...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 33 ++++++++++++++-----------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 5b362e944541..6fad3e1b8c94 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1544,22 +1544,27 @@ static void amdgpu_ras_interrupt_handler(struct 
ras_manager *obj)
                data->rptr = (data->aligned_element_size +
                                data->rptr) % data->ring_size;
 
-               /* Let IP handle its data, maybe we need get the output
-                * from the callback to udpate the error type/count, etc
-                */
                if (data->cb) {
-                       ret = data->cb(obj->adev, &err_data, &entry);
-                       /* ue will trigger an interrupt, and in that case
-                        * we need do a reset to recovery the whole system.
-                        * But leave IP do that recovery, here we just dispatch
-                        * the error.
-                        */
-                       if (ret == AMDGPU_RAS_SUCCESS) {
-                               /* these counts could be left as 0 if
-                                * some blocks do not count error number
+                       if (amdgpu_ras_is_poison_supported(obj->adev) &&
+                           obj->head.block == AMDGPU_RAS_BLOCK__UMC)
+                               dev_info(obj->adev->dev, "RAS poison mode\n");
+                       else {
+                               /* Let IP handle its data, maybe we need get 
the output
+                                * from the callback to udpate the error 
type/count, etc
+                                */
+                               ret = data->cb(obj->adev, &err_data, &entry);
+                               /* ue will trigger an interrupt, and in that 
case
+                                * we need do a reset to recovery the whole 
system.
+                                * But leave IP do that recovery, here we just 
dispatch
+                                * the error.
                                 */
-                               obj->err_data.ue_count += err_data.ue_count;
-                               obj->err_data.ce_count += err_data.ce_count;
+                               if (ret == AMDGPU_RAS_SUCCESS) {
+                                       /* these counts could be left as 0 if
+                                        * some blocks do not count error number
+                                        */
+                                       obj->err_data.ue_count += 
err_data.ue_count;
+                                       obj->err_data.ce_count += 
err_data.ce_count;
+                               }
                        }
                }
        }
--
2.17.1

Reply via email to