Am 2022-03-21 um 23:17 schrieb Zhou1, Tao:
[AMD Official Use Only]



-----Original Message-----
From: Lazar, Lijo <[email protected]>
Sent: Monday, March 21, 2022 7:21 PM
To: Zhou1, Tao <[email protected]>; [email protected]; Zhang,
Hawking <[email protected]>; Kuehling, Felix
<[email protected]>; Yang, Stanley <[email protected]>; Chai,
Thomas <[email protected]>
Subject: Re: [PATCH] drm/amdkfd: print unmap queue status for RAS poison
consumption (v2)



On 3/21/2022 3:08 PM, Tao Zhou wrote:
Print the status out when it passes, and also tell user gpu reset is
triggered when we fallback to legacy way.

v2: make the message more explicitly.

Signed-off-by: Tao Zhou <[email protected]>
---
   drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 11 +++++++----
   1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
index 56902b5bb7b6..32c451f21db7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
@@ -105,8 +105,6 @@ static void
event_interrupt_poison_consumption(struct kfd_dev *dev,
        if (old_poison)
                return;

-       pr_warn("RAS poison consumption handling: client id %d\n", client_id);
-
        switch (client_id) {
        case SOC15_IH_CLIENTID_SE0SH:
        case SOC15_IH_CLIENTID_SE1SH:
@@ -130,10 +128,15 @@ static void
event_interrupt_poison_consumption(struct kfd_dev *dev,
        /* resetting queue passes, do page retirement without gpu reset
         * resetting queue fails, fallback to gpu reset solution
         */
-       if (!ret)
+       if (!ret) {
+               pr_warn("RAS poison consumption, unmap queue flow succeeds:
client id %d\n",
+                               client_id);
As discussed in another patch, I understand that pr_* is the legacy usage in the
file. But it won't be helpful for this case with multiple devices. Would 
suggest to
change to dev_info() - the message here and below seems informational about
the handling of this situation rather than warning of something bad.

Thanks,
Lijo
[Tao] I'll replace pr_warn with dev_info. I think we need a dedicated cleanup 
to retire all pr format message in amdgpu.
RAS poison consumption is a special event should be paid attention to, I think 
a waning is also reasonable.

Or you could make the "unmap success" case a dev_info and the "gpu reset" case a dev_warn.

Either way, v3 of your patch looks good to me and is

Acked-by: Felix Kuehling <[email protected]>

Regards,
  Felix



                amdgpu_amdkfd_ras_poison_consumption_handler(dev->adev,
false);
-       else
+       } else {
+               pr_warn("RAS poison consumption, fallback to gpu reset flow:
client id %d\n",
+                               client_id);
                amdgpu_amdkfd_ras_poison_consumption_handler(dev->adev,
true);
+       }
   }

   static bool event_interrupt_isr_v9(struct kfd_dev *dev,

Reply via email to