On 27-Feb-26 1:36 PM, Wang, Yang(Kevin) wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
The KMD should be responsible(e.g user mode application) for the returned
results rather than simply forwarding firmware values.
So, I totally disagree with your point, and we need to right person to review
this, no further discussion is needed with you.
KMD doesn't know anything about the actual utilization. If you want to
handle it correctly, either return an error (because the utilization
values are improper) or fix at the proper place and return the correct
values.
Doing a blind clamping is not the way to fix it.
Thanks,
Lijo
Best Regards,
Kevin
-----Original Message-----
From: Lazar, Lijo <[email protected]>
Sent: Friday, February 27, 2026 15:55
To: Wang, Yang(Kevin) <[email protected]>; Alex Deucher
<[email protected]>
Cc: [email protected]; Deucher, Alexander <[email protected]>; Zhang,
Hawking <[email protected]>; Feng, Kenneth <[email protected]>
Subject: Re: [PATCH] drm/amd/pm: restrict sensor load values to 0-100
On 27-Feb-26 1:15 PM, Wang, Yang(Kevin) wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
This is not a workaround; you have misunderstood the intent of this patch.
All ASIC load sensors must be constrained to the 0–100 range.
In other words, the KMD driver must not blindly trust the value returned by the
firmware without validation.
For example, invalid values may arise from issues such as memory corruption.
We have many users who really care about the validity of the utilization
values. If firmware returns any garbage like 65535 and driver clamping to show
as 100% utilization is not the value. We don't want to chase ghost utilization
bugs with this. If there are issues with corruption, fix it in the right place,
but keep the integrity of utilization values.
Thanks,
Lijo
Best Regards,
Kevin
-----Original Message-----
From: Lazar, Lijo <[email protected]>
Sent: Friday, February 27, 2026 13:40
To: Wang, Yang(Kevin) <[email protected]>; Alex Deucher
<[email protected]>
Cc: [email protected]; Deucher, Alexander
<[email protected]>; Zhang, Hawking <[email protected]>;
Feng, Kenneth <[email protected]>
Subject: Re: [PATCH] drm/amd/pm: restrict sensor load values to 0-100
On 27-Feb-26 10:14 AM, Wang, Yang(Kevin) wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
Ping...
Please restrict this workaround to the affected SOC. Otherwise, if there are
bogus values, we will fix it at the right place.
Thanks,
Lijo
Best Regards,
Kevin
-----Original Message-----
From: Alex Deucher <[email protected]>
Sent: Wednesday, February 25, 2026 10:24 PM
To: Lazar, Lijo <[email protected]>
Cc: Wang, Yang(Kevin) <[email protected]>;
[email protected]; Deucher, Alexander
<[email protected]>; Zhang, Hawking <[email protected]>;
Feng, Kenneth <[email protected]>
Subject: Re: [PATCH] drm/amd/pm: restrict sensor load values to 0-100
On Wed, Feb 25, 2026 at 7:14 AM Lazar, Lijo <[email protected]> wrote:
On 25-Feb-26 3:04 PM, Yang Wang wrote:
Limit GPU/MEM/VCN load sensor values to 0-100 range via clamp_t to
ensure validity.
Is this a workaround? If it's not within range, it indicates some
underlying issue.
Likely for:
https://gitlab.freedesktop.org/drm/amd/-/issues/4905
Alex
Thanks,
Lijo
Signed-off-by: Yang Wang <[email protected]>
---
drivers/gpu/drm/amd/pm/amdgpu_pm.c | 27 +++++++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 938361ecae05..86ef1ffbf1dd 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -1414,20 +1414,39 @@ static ssize_t
amdgpu_set_pp_power_profile_mode(struct device *dev,
static int amdgpu_pm_get_sensor_generic(struct amdgpu_device *adev,
enum amd_pp_sensors sensor,
- void *query)
+ uint32_t *val)
{
- int r, size = sizeof(uint32_t);
+ uint32_t tmp = UINT_MAX, size = sizeof(tmp);
+ int r;
+
+ if (!val)
+ return -EINVAL;
r = amdgpu_pm_get_access_if_active(adev);
if (r)
return r;
/* get the sensor value */
- r = amdgpu_dpm_read_sensor(adev, sensor, query, &size);
+ r = amdgpu_dpm_read_sensor(adev, sensor, (void *)&tmp,
+ &size);
amdgpu_pm_put_access(adev);
- return r;
+ if (r)
+ return r;
+
+ switch (sensor) {
+ case AMDGPU_PP_SENSOR_GPU_LOAD:
+ case AMDGPU_PP_SENSOR_MEM_LOAD:
+ case AMDGPU_PP_SENSOR_VCN_LOAD:
+ tmp = clamp_t(uint32_t, tmp, 0, 100);
+ break;
+ default:
+ break;
+ }
+
+ *val = tmp;
+
+ return 0;
}
/**