A v3d core is able to expose a single set of HW performance counters, so at any moment at most one perfmon can be programmed in HW. Currently, the driver tracks the active perfmon with v3d_dev->active_perfmon, but three long-standing issues makes perfmon handling unreliable:
1. The active_perfmon pointer is accessed lock-free from scheduler callbacks, the GPU-reset path and the perfmon ioctls. Note that the v3d_perfmon->lock mutex serialized start/stop of one perfmon object against itself, but the invariant that needs protection is device-wide. 2. perfmon start/stop is hooked exclusively to run_job() callbacks via v3d_switch_perfmon(). If nothing is queued behind a perfmon-carrying job, the perfmon is never actually stopped. 3. A non-global perfmon should count events generated by a specific submission, but the scheduler can run jobs from different queues concurrently. Without explicit cross-queue serialization, an unrelated job running in parallel pollutes the counters and produces unusable results. This series aims to address all three issues. PATCH 1 moves the locking to where the invariant actually lives (fixing issue #1) and replaces the sleeping mutex with a spinlock, which allows us to stop the perfmon from the IRQ handler at job-completion time (the natural boundary for "active perfmon follows the active job") and fixes issue #2. PATCH 2 addresses issue #3 by building on the new locking to enforce cross-queue serialization when a non-global perfmon is attached, by adding scheduler fence dependencies during submission. The fence dependencies allows us to enforce two rules: 1. A job that carries a non-global perfmon waits for every job currently in-flight across all HW queues to finish. 2. While such a job is in-flight, any subsequently submitted job waits for it. This allows us to ensure cross-queue isolation and the reliability of the performance counters values. To make sure that this series actually produces the expected results and improves the overall reliability of v3d's performance monitors, this series is accompanied by a IGT series [1] that adds new perfmon tests for v3d. This series depends on [2]. [1] https://lore.kernel.org/igt-dev/[email protected]/T/ [2] https://lore.kernel.org/dri-devel/[email protected]/T/ Best regards, - Maíra --- Maíra Canal (2): drm/v3d: Refactor perfmon locking drm/v3d: Serialize jobs across queues when a perfmon is attached drivers/gpu/drm/v3d/v3d_drv.h | 50 ++++++++++--- drivers/gpu/drm/v3d/v3d_gem.c | 6 +- drivers/gpu/drm/v3d/v3d_irq.c | 6 ++ drivers/gpu/drm/v3d/v3d_perfmon.c | 148 ++++++++++++++++++++++++++------------ drivers/gpu/drm/v3d/v3d_power.c | 4 ++ drivers/gpu/drm/v3d/v3d_sched.c | 24 +------ drivers/gpu/drm/v3d/v3d_submit.c | 90 ++++++++++++++++++++--- 7 files changed, 242 insertions(+), 86 deletions(-) --- base-commit: c006978163fd001fbca55e5fa57bddcf49f47ad9 change-id: 20260505-v3d-perfmon-lifetime-48c9ded1091b prerequisite-change-id: 20260407-v3d-sched-misc-fixes-623739017e53:v1 prerequisite-patch-id: 01823e165a822ddec72b0b18e49c096d35149e9a prerequisite-patch-id: 1df1c11ec62617336e2ed5445c24bbb912570035 prerequisite-patch-id: 738852cd3115283b43cef336ba8fe88616f28a88 prerequisite-patch-id: 8614f719aa20e50a006ed6ce32133cdae287988d prerequisite-patch-id: 32c804d9921bcf259b236b3f1d74f7972aec02f2 prerequisite-patch-id: 737306a958062699aa5c0ac34fb4dfebed9ec5bc prerequisite-patch-id: dd4af1c51cf194a331814ba0ed997da19db633e3 prerequisite-patch-id: 62fbbb75f3d28b21ac3922dd36aec6e7721d2c55 prerequisite-patch-id: b5bd534295a27c8d4539bab81db0a4cfef4ea242 prerequisite-patch-id: 11274552d6ed495d712083c5dced8aa2886185d2
