Hi,
On 2025-12-09 at 13:36:15 +0530, Anirban, Sk wrote:
> Hi,
>
> On 09-12-2025 12:46 pm, Krzysztof Karas wrote:
> > Hi Sk Anirban,
> >
> > On 2025-12-09 at 11:26:17 +0530, Sk Anirban wrote:
> > > Report GPU throttle reasons when RPS tests fail to reach expected
> > > frequencies or power levels.
> > >
> > > Signed-off-by: Sk Anirban <[email protected]>
> > > ---
> > > drivers/gpu/drm/i915/gt/selftest_rps.c | 17 +++++++++++++++++
> > > 1 file changed, 17 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > b/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > index 73bc91c6ea07..01c671e00e61 100644
> > > --- a/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
> > > @@ -1138,6 +1138,7 @@ int live_rps_power(void *arg)
> > > struct intel_engine_cs *engine;
> > > enum intel_engine_id id;
> > > struct igt_spinner spin;
> > > + u32 throttle;
> > > int err = 0;
> > > /*
> > > @@ -1216,6 +1217,13 @@ int live_rps_power(void *arg)
> > > if (11 * min.power > 10 * max.power) {
> > > pr_err("%s: did not conserve power when setting
> > > lower frequency!\n",
> > > engine->name);
> > > +
> > > + throttle = intel_uncore_read(gt->uncore,
> > > +
> > > intel_gt_perf_limit_reasons_reg(gt));
> > > +
> > > + pr_warn("%s: GPU throttled with reasons 0x%08x\n",
> > > + engine->name, throttle &
> > > GT0_PERF_LIMIT_REASONS_MASK);
> > > +
> > This feels like spamming the system messages. We are on error
> > path already and exiting with -EINVAL, so printing an
> > unconditional warning here is excessive, I think.
> >
> > [...]
> Got it. Based on past experience, most failures occur due to throttling.
> However, I can switch it to pr_info since we already print pr_err
> beforehand.
No, that would bunch up two reasons for potential failure.
If you experienced problems in condition check:
if (11 * min.power > 10 * max.power)
due to throttling, then throttling detection could use its own
error path, something like this could work:
if (11 * min.power > 10 * max.power) {
- pr_err("%s: did not conserve power when setting lower
frequency!\n",
- engine->name);
+ if (read_cagf(rps) != rps->max_freq) {
+ throttle = intel_uncore_read(gt->uncore,
+
intel_gt_perf_limit_reasons_reg(gt));
+ pr_err("%s: GPU throttled with reasons
0x%08x\n",
+ engine->name, throttle &
GT0_PERF_LIMIT_REASONS_MASK);
+ } else {
+ pr_err("%s: did not conserve power when setting
lower frequency!\n",
+ engine->name);
+ }
+
err = -EINVAL;
break;
}
The main goal would be to differentiate and print only one
reason for failure, instead of emitting two of them and leaving
people guessing which one of the two was the real reason the
function returned with -EINVAL.
I did not test the code above, so it may require some changes.
--
Best Regards,
Krzysztof