A v3d core is able to expose a single set of HW performance counters, so at
any moment at most one perfmon can be programmed in HW. Currently, the
driver tracks the active perfmon with v3d_dev->active_perfmon, but three
long-standing issues makes perfmon handling unreliable:

1. The active_perfmon pointer is accessed lock-free from scheduler
   callbacks, the GPU-reset path and the perfmon ioctls. Note that the
   v3d_perfmon->lock mutex serialized start/stop of one perfmon object
   against itself, but the invariant that needs protection is device-wide.

2. perfmon start/stop is hooked exclusively to run_job() callbacks via
   v3d_switch_perfmon(). If nothing is queued behind a perfmon-carrying
   job, the perfmon is never actually stopped.

3. A non-global perfmon should count events generated by a specific
   submission, but the scheduler can run jobs from different queues
   concurrently. Without explicit cross-queue serialization, an unrelated
   job running in parallel pollutes the counters and produces unusable
   results.

This series aims to address all three issues. PATCH 1 moves the locking to
where the invariant actually lives (fixing issue #1) and replaces the
sleeping mutex with a spinlock, which allows us to stop the perfmon from
the IRQ handler at job-completion time (the natural boundary for "active
perfmon follows the active job") and fixes issue #2.

PATCH 2 addresses issue #3 by building on the new locking to enforce
cross-queue serialization when a non-global perfmon is attached, by adding
scheduler fence dependencies during submission. The fence dependencies
allows us to enforce two rules:

1. A job that carries a non-global perfmon waits for every job currently
   in-flight across all HW queues to finish.

2. While such a job is in-flight, any subsequently submitted job waits
   for it.

This allows us to ensure cross-queue isolation and the reliability of
the performance counters values.

To make sure that this series actually produces the expected results and 
improves the overall reliability of v3d's performance monitors, this
series is accompanied by a IGT series [1] that adds new perfmon tests
for v3d.

This series depends on [2].

[1] 
https://lore.kernel.org/igt-dev/[email protected]/T/
[2] 
https://lore.kernel.org/dri-devel/[email protected]/T/

Best regards,
- Maíra

---
Maíra Canal (2):
      drm/v3d: Refactor perfmon locking
      drm/v3d: Serialize jobs across queues when a perfmon is attached

 drivers/gpu/drm/v3d/v3d_drv.h     |  50 ++++++++++---
 drivers/gpu/drm/v3d/v3d_gem.c     |   6 +-
 drivers/gpu/drm/v3d/v3d_irq.c     |   6 ++
 drivers/gpu/drm/v3d/v3d_perfmon.c | 148 ++++++++++++++++++++++++++------------
 drivers/gpu/drm/v3d/v3d_power.c   |   4 ++
 drivers/gpu/drm/v3d/v3d_sched.c   |  24 +------
 drivers/gpu/drm/v3d/v3d_submit.c  |  90 ++++++++++++++++++++---
 7 files changed, 242 insertions(+), 86 deletions(-)
---
base-commit: c006978163fd001fbca55e5fa57bddcf49f47ad9
change-id: 20260505-v3d-perfmon-lifetime-48c9ded1091b
prerequisite-change-id: 20260407-v3d-sched-misc-fixes-623739017e53:v1
prerequisite-patch-id: 01823e165a822ddec72b0b18e49c096d35149e9a
prerequisite-patch-id: 1df1c11ec62617336e2ed5445c24bbb912570035
prerequisite-patch-id: 738852cd3115283b43cef336ba8fe88616f28a88
prerequisite-patch-id: 8614f719aa20e50a006ed6ce32133cdae287988d
prerequisite-patch-id: 32c804d9921bcf259b236b3f1d74f7972aec02f2
prerequisite-patch-id: 737306a958062699aa5c0ac34fb4dfebed9ec5bc
prerequisite-patch-id: dd4af1c51cf194a331814ba0ed997da19db633e3
prerequisite-patch-id: 62fbbb75f3d28b21ac3922dd36aec6e7721d2c55
prerequisite-patch-id: b5bd534295a27c8d4539bab81db0a4cfef4ea242
prerequisite-patch-id: 11274552d6ed495d712083c5dced8aa2886185d2

Reply via email to