From: Tvrtko Ursulin <tvrtko.ursu...@intel.com>

While thinking about sporadic failures of perf_pmu/rc6-runtime-pm* tests
on some CI machines I have concluded that: a) the PMU readout of RC6 can
race against runtime PM transitions, and b) there are other reasons than
being runtime suspended which can cause intel_runtime_pm_get_if_in_use to
fail.

Therefore when estimating RC6 the code needs to assert we are indeed in
suspended state and if not the best we can do is return the last known RC6
value.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
Fixes: 1fe699e30113 ("drm/i915/pmu: Fix sleep under atomic in RC6 readout")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105010
Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
Cc: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Imre Deak <imre.d...@intel.com>
---
I was able to trigger state != RPM_SUSPENDED on the shards, but not yet
the actual estimation overaccounting. As such this fix is based partially
on speculation that it will fix the sporadic perf_pmu/rc6* failures.
Nevertheless I think it is correct to add this check regardless.
---
 drivers/gpu/drm/i915/i915_pmu.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index bd7e695fc663..e92a9571db77 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -473,6 +473,30 @@ static u64 get_rc6(struct drm_i915_private *i915)
                spin_lock_irqsave(&i915->pmu.lock, flags);
                spin_lock(&kdev->power.lock);
 
+               /*
+                * After the above branch intel_runtime_pm_get_if_in_use failed
+                * to get the runtime PM reference we cannot assume we are in
+                * runtime suspend since we can either: a) race with coming out
+                * of it before we took the power.lock, or b) there are other
+                * states than suspended which can bring us here.
+                *
+                * We need to double-check that we are indeed currently runtime
+                * suspended and if not we cannot do better than report the last
+                * known RC6 value.
+                */
+               if (kdev->power.runtime_status != RPM_SUSPENDED) {
+                       spin_unlock(&kdev->power.lock);
+
+                       if (i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur)
+                               val = 
i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur;
+                       else
+                               val = i915->pmu.sample[__I915_SAMPLE_RC6].cur;
+
+                       spin_unlock_irqrestore(&i915->pmu.lock, flags);
+
+                       return val;
+               }
+
                if (!i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur)
                        i915->pmu.suspended_jiffies_last =
                                                kdev->power.suspended_jiffies;
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to