On Thu, Mar 26, 2026 at 7:15 PM Wang, Yang(Kevin)
<[email protected]> wrote:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> >> However, this wrong value will still affect how 'pp_dpm_uclk' and 
> >> 'gpu_metrics' show their results.
> change to
> However, this wrong value will still affect how 'pp_dpm_uclk','pp_dpm_fclk' 
> and 'gpu_metrics' show their results.
>
> Best Regards,
> Kevin
>
> -----Original Message-----
> From: amd-gfx <[email protected]> On Behalf Of Wang, 
> Yang(Kevin)
> Sent: Friday, March 27, 2026 7:10 AM
> To: Alex Deucher <[email protected]>
> Cc: [email protected]; Deucher, Alexander 
> <[email protected]>; Zhang, Hawking <[email protected]>; Feng, 
> Kenneth <[email protected]>
> Subject: RE: [PATCH] drm/amd/pm: correct mem_busy_percent display due to 
> calculation errors
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> >> What about returning -EBUSY or -EAGAIN if the value is negative?
>
> However, this wrong value will still affect how 'pp_dpm_uclk' and 
> 'gpu_metrics' show their results.
> While the 'mem_busy_percent' sysfs node is fine as it is, the KMD still needs 
> a special code path to handle it properly.
> Also, for 'gpu_metrics', the driver must return some value to fill the buffer 
> no matter what to avoid breaking node function.
>
> For "activity" variables, we can safely assume they should only be between 0 
> and 100.
> Besides, PMFW uses a similar method internally, So, I think limiting the 
> value to 0 is a reasonable fix.

Acked-by: Alex Deucher <[email protected]>

>
> Best Regards,
> Kevin
>
> -----Original Message-----
> From: Alex Deucher <[email protected]>
> Sent: Friday, March 27, 2026 5:33 AM
> To: Lazar, Lijo <[email protected]>
> Cc: Wang, Yang(Kevin) <[email protected]>; 
> [email protected]; Deucher, Alexander 
> <[email protected]>; Zhang, Hawking <[email protected]>; Feng, 
> Kenneth <[email protected]>
> Subject: Re: [PATCH] drm/amd/pm: correct mem_busy_percent display due to 
> calculation errors
>
> On Thu, Mar 26, 2026 at 12:37 AM Lazar, Lijo <[email protected]> wrote:
> >
> >
> >
> > On 26-Mar-26 7:34 AM, Yang Wang wrote:
> > > PMFW may return invalid values due to internal calculation errors.
> > > so, the kmd driver must validate and sanitize the returned values to
> > > prevent issues caused by firmware calculation errors.
> > >
> > > For example, values 0xfffe (-2) and 0xffff (-1) are treated as
> > > invalid and clamped to 0.
> > >
> >
> > The problem with clamping is that the issue takes a different
> > direction after that.
> >
> > Presently, the issue is reported as garbage values reported in activity.
> > With clamping, the issue could get reported as 100% activity with
> > light load or 0% activity with a heavy load. That will take the debug
> > in a different direction.
> >
> > Instead, isn't it better to keep this as some errata and let user apps
> > filter out garbage values? The previous or next sample could reflect
> > the activity correctly. Later fix can be added to newer firmware, if
> > that is an option.
>
> What about returning -EBUSY or -EAGAIN if the value is negative?
>
> Alex
>
> >
> > Thanks,
> > Lijo
> >
> >
> > > this applies to devices with CAB (Cache As Buffer) functionality.
> > >
> > > Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4905
> > >
> > > Signed-off-by: Yang Wang <[email protected]>
> > > ---
> > >   drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h   | 17 +++++++++++++++++
> > >   .../drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c    | 10 +++++-----
> > >   .../drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c    | 10 +++++-----
> > >   .../drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c    | 10 +++++-----
> > >   4 files changed, 32 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> > > b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> > > index 609f5ab07d8a..365946c43e11 100644
> > > --- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> > > +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> > > @@ -2164,4 +2164,21 @@ static inline void smu_feature_init(struct 
> > > smu_context *smu, int feature_num)
> > >       smu_feature_list_clear_all(smu, SMU_FEATURE_LIST_ALLOWED);
> > >   }
> > >
> > > +/*
> > > + * smu_safe_u16_nn - Make u16 safe by filtering negative overflow
> > > +errors
> > > + * @val: Input u16 value, may contain invalid negative overflows
> > > + *
> > > + * Convert u16 to non-negative value. Cast to s16 to detect
> > > +negative values
> > > + * caused by calculation errors. Return 0 for negative errors,
> > > +return
> > > + * original value if valid.
> > > + *
> > > + * Return: Valid u16 value or 0
> > > + */
> > > +static inline u16 smu_safe_u16_nn(u16 val) {
> > > +    s16 tmp = (s16)val;
> > > +
> > > +    return tmp < 0 ? 0 : val;
> > > +}
> > > +
> > >   #endif
> > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
> > > b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
> > > index 9be7a2af560d..16f69b548ca4 100644
> > > --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
> > > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
> > > @@ -774,13 +774,13 @@ static int smu_v13_0_0_get_smu_metrics_data(struct 
> > > smu_context *smu,
> > >                       *value = metrics->AverageGfxclkFrequencyPreDs;
> > >               break;
> > >       case METRICS_AVERAGE_FCLK:
> > > -             if (metrics->AverageUclkActivity <= 
> > > SMU_13_0_0_BUSY_THRESHOLD)
> > > +             if (smu_safe_u16_nn(metrics->AverageUclkActivity) <=
> > > + SMU_13_0_0_BUSY_THRESHOLD)
> > >                       *value = metrics->AverageFclkFrequencyPostDs;
> > >               else
> > >                       *value = metrics->AverageFclkFrequencyPreDs;
> > >               break;
> > >       case METRICS_AVERAGE_UCLK:
> > > -             if (metrics->AverageUclkActivity <= 
> > > SMU_13_0_0_BUSY_THRESHOLD)
> > > +             if (smu_safe_u16_nn(metrics->AverageUclkActivity) <=
> > > + SMU_13_0_0_BUSY_THRESHOLD)
> > >                       *value = metrics->AverageMemclkFrequencyPostDs;
> > >               else
> > >                       *value = metrics->AverageMemclkFrequencyPreDs;
> > > @@ -801,7 +801,7 @@ static int smu_v13_0_0_get_smu_metrics_data(struct 
> > > smu_context *smu,
> > >               *value = metrics->AverageGfxActivity;
> > >               break;
> > >       case METRICS_AVERAGE_MEMACTIVITY:
> > > -             *value = metrics->AverageUclkActivity;
> > > +             *value =
> > > + smu_safe_u16_nn(metrics->AverageUclkActivity);
> > >               break;
> > >       case METRICS_AVERAGE_VCNACTIVITY:
> > >               *value = max(metrics->Vcn0ActivityPercentage,
> > > @@ -2086,7 +2086,7 @@ static ssize_t
> > > smu_v13_0_0_get_gpu_metrics(struct smu_context *smu,
> > >
> > > metrics->AvgTemperature[TEMP_VR_MEM1]);
> > >
> > >       gpu_metrics->average_gfx_activity = metrics->AverageGfxActivity;
> > > -     gpu_metrics->average_umc_activity = metrics->AverageUclkActivity;
> > > +     gpu_metrics->average_umc_activity =
> > > + smu_safe_u16_nn(metrics->AverageUclkActivity);
> > >       gpu_metrics->average_mm_activity =
> > > max(metrics->Vcn0ActivityPercentage,
> > >
> > > metrics->Vcn1ActivityPercentage);
> > >
> > > @@ -2103,7 +2103,7 @@ static ssize_t smu_v13_0_0_get_gpu_metrics(struct 
> > > smu_context *smu,
> > >       else
> > >               gpu_metrics->average_gfxclk_frequency =
> > > metrics->AverageGfxclkFrequencyPreDs;
> > >
> > > -     if (metrics->AverageUclkActivity <= SMU_13_0_0_BUSY_THRESHOLD)
> > > +     if (smu_safe_u16_nn(metrics->AverageUclkActivity) <=
> > > + SMU_13_0_0_BUSY_THRESHOLD)
> > >               gpu_metrics->average_uclk_frequency = 
> > > metrics->AverageMemclkFrequencyPostDs;
> > >       else
> > >               gpu_metrics->average_uclk_frequency =
> > > metrics->AverageMemclkFrequencyPreDs;
> > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
> > > b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
> > > index 5cc15545da6e..34a5973b9a06 100644
> > > --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
> > > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
> > > @@ -784,13 +784,13 @@ static int smu_v13_0_7_get_smu_metrics_data(struct 
> > > smu_context *smu,
> > >               *value = metrics->AverageGfxclkFrequencyPreDs;
> > >               break;
> > >       case METRICS_AVERAGE_FCLK:
> > > -             if (metrics->AverageUclkActivity <= 
> > > SMU_13_0_7_BUSY_THRESHOLD)
> > > +             if (smu_safe_u16_nn(metrics->AverageUclkActivity) <=
> > > + SMU_13_0_7_BUSY_THRESHOLD)
> > >                       *value = metrics->AverageFclkFrequencyPostDs;
> > >               else
> > >                       *value = metrics->AverageFclkFrequencyPreDs;
> > >               break;
> > >       case METRICS_AVERAGE_UCLK:
> > > -             if (metrics->AverageUclkActivity <= 
> > > SMU_13_0_7_BUSY_THRESHOLD)
> > > +             if (smu_safe_u16_nn(metrics->AverageUclkActivity) <=
> > > + SMU_13_0_7_BUSY_THRESHOLD)
> > >                       *value = metrics->AverageMemclkFrequencyPostDs;
> > >               else
> > >                       *value = metrics->AverageMemclkFrequencyPreDs;
> > > @@ -815,7 +815,7 @@ static int smu_v13_0_7_get_smu_metrics_data(struct 
> > > smu_context *smu,
> > >               *value = metrics->AverageGfxActivity;
> > >               break;
> > >       case METRICS_AVERAGE_MEMACTIVITY:
> > > -             *value = metrics->AverageUclkActivity;
> > > +             *value =
> > > + smu_safe_u16_nn(metrics->AverageUclkActivity);
> > >               break;
> > >       case METRICS_AVERAGE_SOCKETPOWER:
> > >               *value = metrics->AverageSocketPower << 8; @@ -2092,7
> > > +2092,7 @@ static ssize_t smu_v13_0_7_get_gpu_metrics(struct
> > > +smu_context *smu,
> > >
> > > metrics->AvgTemperature[TEMP_VR_MEM1]);
> > >
> > >       gpu_metrics->average_gfx_activity = metrics->AverageGfxActivity;
> > > -     gpu_metrics->average_umc_activity = metrics->AverageUclkActivity;
> > > +     gpu_metrics->average_umc_activity =
> > > + smu_safe_u16_nn(metrics->AverageUclkActivity);
> > >       gpu_metrics->average_mm_activity =
> > > max(metrics->Vcn0ActivityPercentage,
> > >
> > > metrics->Vcn1ActivityPercentage);
> > >
> > > @@ -2105,7 +2105,7 @@ static ssize_t smu_v13_0_7_get_gpu_metrics(struct 
> > > smu_context *smu,
> > >       else
> > >               gpu_metrics->average_gfxclk_frequency =
> > > metrics->AverageGfxclkFrequencyPreDs;
> > >
> > > -     if (metrics->AverageUclkActivity <= SMU_13_0_7_BUSY_THRESHOLD)
> > > +     if (smu_safe_u16_nn(metrics->AverageUclkActivity) <=
> > > + SMU_13_0_7_BUSY_THRESHOLD)
> > >               gpu_metrics->average_uclk_frequency = 
> > > metrics->AverageMemclkFrequencyPostDs;
> > >       else
> > >               gpu_metrics->average_uclk_frequency =
> > > metrics->AverageMemclkFrequencyPreDs;
> > > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
> > > b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
> > > index 28c1b084fe62..aaec3a251e0f 100644
> > > --- a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
> > > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
> > > @@ -660,13 +660,13 @@ static int smu_v14_0_2_get_smu_metrics_data(struct 
> > > smu_context *smu,
> > >                       *value = metrics->AverageGfxclkFrequencyPreDs;
> > >               break;
> > >       case METRICS_AVERAGE_FCLK:
> > > -             if (metrics->AverageUclkActivity <= 
> > > SMU_14_0_2_BUSY_THRESHOLD)
> > > +             if (smu_safe_u16_nn(metrics->AverageUclkActivity) <=
> > > + SMU_14_0_2_BUSY_THRESHOLD)
> > >                       *value = metrics->AverageFclkFrequencyPostDs;
> > >               else
> > >                       *value = metrics->AverageFclkFrequencyPreDs;
> > >               break;
> > >       case METRICS_AVERAGE_UCLK:
> > > -             if (metrics->AverageUclkActivity <= 
> > > SMU_14_0_2_BUSY_THRESHOLD)
> > > +             if (smu_safe_u16_nn(metrics->AverageUclkActivity) <=
> > > + SMU_14_0_2_BUSY_THRESHOLD)
> > >                       *value = metrics->AverageMemclkFrequencyPostDs;
> > >               else
> > >                       *value = metrics->AverageMemclkFrequencyPreDs;
> > > @@ -687,7 +687,7 @@ static int smu_v14_0_2_get_smu_metrics_data(struct 
> > > smu_context *smu,
> > >               *value = metrics->AverageGfxActivity;
> > >               break;
> > >       case METRICS_AVERAGE_MEMACTIVITY:
> > > -             *value = metrics->AverageUclkActivity;
> > > +             *value =
> > > + smu_safe_u16_nn(metrics->AverageUclkActivity);
> > >               break;
> > >       case METRICS_AVERAGE_VCNACTIVITY:
> > >               *value = max(metrics->AverageVcn0ActivityPercentage,
> > > @@ -2146,7 +2146,7 @@ static ssize_t
> > > smu_v14_0_2_get_gpu_metrics(struct smu_context *smu,
> > >
> > > metrics->AvgTemperature[TEMP_VR_MEM1]);
> > >
> > >       gpu_metrics->average_gfx_activity = metrics->AverageGfxActivity;
> > > -     gpu_metrics->average_umc_activity = metrics->AverageUclkActivity;
> > > +     gpu_metrics->average_umc_activity =
> > > + smu_safe_u16_nn(metrics->AverageUclkActivity);
> > >       gpu_metrics->average_mm_activity =
> > > max(metrics->AverageVcn0ActivityPercentage,
> > >
> > > metrics->Vcn1ActivityPercentage);
> > >
> > > @@ -2158,7 +2158,7 @@ static ssize_t smu_v14_0_2_get_gpu_metrics(struct 
> > > smu_context *smu,
> > >       else
> > >               gpu_metrics->average_gfxclk_frequency =
> > > metrics->AverageGfxclkFrequencyPreDs;
> > >
> > > -     if (metrics->AverageUclkActivity <= SMU_14_0_2_BUSY_THRESHOLD)
> > > +     if (smu_safe_u16_nn(metrics->AverageUclkActivity) <=
> > > + SMU_14_0_2_BUSY_THRESHOLD)
> > >               gpu_metrics->average_uclk_frequency = 
> > > metrics->AverageMemclkFrequencyPostDs;
> > >       else
> > >               gpu_metrics->average_uclk_frequency =
> > > metrics->AverageMemclkFrequencyPreDs;
> >

Reply via email to