On 27-Feb-26 1:36 PM, Wang, Yang(Kevin) wrote:
[AMD Official Use Only - AMD Internal Distribution Only]

The KMD should be responsible(e.g user mode application) for the returned 
results rather than simply forwarding firmware values.
So, I totally disagree with your point, and we need to right person to review 
this, no further discussion is needed with you.


KMD doesn't know anything about the actual utilization. If you want to handle it correctly, either return an error (because the utilization values are improper) or fix at the proper place and return the correct values.

Doing a blind clamping is not the way to fix it.

Thanks,
Lijo

Best Regards,
Kevin
-----Original Message-----
From: Lazar, Lijo <[email protected]>
Sent: Friday, February 27, 2026 15:55
To: Wang, Yang(Kevin) <[email protected]>; Alex Deucher 
<[email protected]>
Cc: [email protected]; Deucher, Alexander <[email protected]>; Zhang, 
Hawking <[email protected]>; Feng, Kenneth <[email protected]>
Subject: Re: [PATCH] drm/amd/pm: restrict sensor load values to 0-100



On 27-Feb-26 1:15 PM, Wang, Yang(Kevin) wrote:
[AMD Official Use Only - AMD Internal Distribution Only]

This is not a workaround; you have misunderstood the intent of this patch.
All ASIC load sensors must be constrained to the 0–100 range.
In other words, the KMD driver must not blindly trust the value returned by the 
firmware without validation.
For example, invalid values may arise from issues such as memory corruption.


We have many users who really care about the validity of the utilization 
values. If firmware returns any garbage like 65535 and driver clamping to show 
as 100% utilization is not the value. We don't want to chase ghost utilization 
bugs with this. If there are issues with corruption, fix it in the right place, 
but keep the integrity of utilization values.

Thanks,
Lijo

Best Regards,
Kevin

-----Original Message-----
From: Lazar, Lijo <[email protected]>
Sent: Friday, February 27, 2026 13:40
To: Wang, Yang(Kevin) <[email protected]>; Alex Deucher
<[email protected]>
Cc: [email protected]; Deucher, Alexander
<[email protected]>; Zhang, Hawking <[email protected]>;
Feng, Kenneth <[email protected]>
Subject: Re: [PATCH] drm/amd/pm: restrict sensor load values to 0-100


On 27-Feb-26 10:14 AM, Wang, Yang(Kevin) wrote:
[AMD Official Use Only - AMD Internal Distribution Only]

Ping...


Please restrict this workaround to the affected SOC. Otherwise, if there are 
bogus values, we will fix it at the right place.

Thanks,
Lijo

Best Regards,
Kevin

-----Original Message-----
From: Alex Deucher <[email protected]>
Sent: Wednesday, February 25, 2026 10:24 PM
To: Lazar, Lijo <[email protected]>
Cc: Wang, Yang(Kevin) <[email protected]>;
[email protected]; Deucher, Alexander
<[email protected]>; Zhang, Hawking <[email protected]>;
Feng, Kenneth <[email protected]>
Subject: Re: [PATCH] drm/amd/pm: restrict sensor load values to 0-100

On Wed, Feb 25, 2026 at 7:14 AM Lazar, Lijo <[email protected]> wrote:



On 25-Feb-26 3:04 PM, Yang Wang wrote:
Limit GPU/MEM/VCN load sensor values to 0-100 range via clamp_t to
ensure validity.


Is this a workaround? If it's not within range, it indicates some
underlying issue.

Likely for:
https://gitlab.freedesktop.org/drm/amd/-/issues/4905

Alex


Thanks,
Lijo

Signed-off-by: Yang Wang <[email protected]>
---
     drivers/gpu/drm/amd/pm/amdgpu_pm.c | 27 +++++++++++++++++++++++----
     1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 938361ecae05..86ef1ffbf1dd 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -1414,20 +1414,39 @@ static ssize_t
amdgpu_set_pp_power_profile_mode(struct device *dev,

     static int amdgpu_pm_get_sensor_generic(struct amdgpu_device *adev,
                                         enum amd_pp_sensors sensor,
-                                     void *query)
+                                     uint32_t *val)
     {
-     int r, size = sizeof(uint32_t);
+     uint32_t tmp = UINT_MAX, size = sizeof(tmp);
+     int r;
+
+     if (!val)
+             return -EINVAL;

         r = amdgpu_pm_get_access_if_active(adev);
         if (r)
                 return r;

         /* get the sensor value */
-     r = amdgpu_dpm_read_sensor(adev, sensor, query, &size);
+     r = amdgpu_dpm_read_sensor(adev, sensor, (void *)&tmp,
+ &size);

         amdgpu_pm_put_access(adev);

-     return r;
+     if (r)
+             return r;
+
+     switch (sensor) {
+     case AMDGPU_PP_SENSOR_GPU_LOAD:
+     case AMDGPU_PP_SENSOR_MEM_LOAD:
+     case AMDGPU_PP_SENSOR_VCN_LOAD:
+             tmp = clamp_t(uint32_t, tmp, 0, 100);
+             break;
+     default:
+             break;
+     }
+
+     *val = tmp;
+
+     return 0;
     }

     /**




Reply via email to