A v3d core is able to expose a single set of HW performance counters, so at any moment at most one perfmon can be programmed in HW. Currently, the driver tracks the active perfmon with v3d_dev->active_perfmon, but three long-standing issues makes perfmon handling unreliable:
1. The active_perfmon pointer is accessed lock-free from scheduler callbacks, the GPU-reset path and the perfmon ioctls. Note that the v3d_perfmon->lock mutex serialized start/stop of one perfmon object against itself, but the invariant that needs protection is device-wide. 2. perfmon start/stop is hooked exclusively to run_job() callbacks via v3d_switch_perfmon(). If nothing is queued behind a perfmon-carrying job, the perfmon is never actually stopped. 3. A non-global perfmon should count events generated by a specific submission, but the scheduler can run jobs from different queues concurrently. Without explicit cross-queue serialization, an unrelated job running in parallel pollutes the counters and produces unusable results. This series aims to address all three issues. PATCH 1 is a minimal, stable-targeted fix for a separate problem: the SET_GLOBAL ioctl leaks the perfmon reference on several paths. It is kept self-contained so it can be backported on its own. PATCH 2 moves the locking to where the invariant actually lives (fixing issue #1) and replaces the sleeping mutex with a spinlock, which allows us to stop the perfmon from the IRQ handler at job-completion time (the natural boundary for "active perfmon follows the active job") and fixes issue #2. PATCH 3 addresses issue #3 by building on the new locking to enforce cross-queue serialization when a non-global perfmon is attached, by adding scheduler fence dependencies during submission. The fence dependencies allow us to enforce two rules: 1. A job that carries a non-global perfmon waits for every job currently in-flight across all HW queues to finish. 2. While such a job is in-flight, any subsequently submitted job waits for it. This allows us to ensure cross-queue isolation and the reliability of the performance counter values. PATCH 4 is a cleanup that drops the now-redundant queue argument from v3d_job_add_syncobjs(), as struct v3d_job carries its submission queue after PATCH 3. To make sure that this series actually produces the expected results and improves the overall reliability of v3d's performance monitors, this series is accompanied by a IGT series [1], which was already merged. This series depends on [2]. [1] https://lore.kernel.org/igt-dev/[email protected]/T/ [2] https://lore.kernel.org/dri-devel/[email protected]/T/ Best regards, - Maíra --- v1 -> v2: https://lore.kernel.org/r/[email protected] - Rebased on top of "[PATCH v2 00/14] drm/v3d: Scheduler and submission fixes and refactoring" - [1/4] NEW PATCH: "drm/v3d: Fix global performance monitor reference counting" - Minimal patch for stable branches only fixing the reference leaks in global perfmons. - [2/4] Start/stop the global perfmon inside the set_global_perfmon ioctl and simplify global perfmon management across the helpers (Iago Toral) - In the reset path, before stopping the perfmon for the HW reset, v3d_reset() now re-arms the global perfmon with v3d_perfmon_resume(), as the global perfmon's start/stop points live only in the IOCTL. - v3d_perfmon_get_values_ioctl() no longer stops the perfmon, it only captures the values. Lifecycle management is left to the job (per-job perfmons) or the SET_GLOBAL ioctl (global perfmon). - [2/4] In v3d_perfmon_delete(), first, stop the perfmon and then, check if it's a global perfmon (Iago Toral) - [2/4] Add some comments to explain the refcount logic for global perfmons (Iago Toral) - [3/4] Move the job->queue introduction to this patch instead of the previous one. - [4/4] NEW PATCH: "drm/v3d: Drop the queue argument from v3d_job_add_syncobjs()" --- Maíra Canal (4): drm/v3d: Fix global performance monitor reference counting drm/v3d: Refactor perfmon locking drm/v3d: Serialize jobs across queues when a perfmon is attached drm/v3d: Drop the queue argument from v3d_job_add_syncobjs() drivers/gpu/drm/v3d/v3d_drv.h | 50 ++++++++-- drivers/gpu/drm/v3d/v3d_gem.c | 7 +- drivers/gpu/drm/v3d/v3d_irq.c | 7 +- drivers/gpu/drm/v3d/v3d_perfmon.c | 195 +++++++++++++++++++++++++++++--------- drivers/gpu/drm/v3d/v3d_power.c | 4 + drivers/gpu/drm/v3d/v3d_sched.c | 26 +---- drivers/gpu/drm/v3d/v3d_submit.c | 85 ++++++++++++++--- 7 files changed, 282 insertions(+), 92 deletions(-) --- base-commit: 4c26e162947f91aa78ba57dd4fddd38fc80e7d60 change-id: 20260505-v3d-perfmon-lifetime-48c9ded1091b prerequisite-change-id: 20260407-v3d-sched-misc-fixes-623739017e53:v2 prerequisite-patch-id: 01823e165a822ddec72b0b18e49c096d35149e9a prerequisite-patch-id: 1df1c11ec62617336e2ed5445c24bbb912570035 prerequisite-patch-id: 738852cd3115283b43cef336ba8fe88616f28a88 prerequisite-patch-id: e1fa04bb45b0c1eb1478b2893b0dedcb6a825255 prerequisite-patch-id: 32c804d9921bcf259b236b3f1d74f7972aec02f2 prerequisite-patch-id: b1b437650405dd43ed324f9c02a0e591a793aec5 prerequisite-patch-id: f13898126dac8b6f14d8e1fba8804123f012889e prerequisite-patch-id: e03f525b4491a3475ed0efa68bc8049f92be0bd0 prerequisite-patch-id: 971ac9100d4958aa2c0da2d734533eb9ea9d80b3 prerequisite-patch-id: 9d3d40ef80c032c0c05b058f6c397d04d1879236 prerequisite-patch-id: fb2e7a320a7ec3df75b74cfa6683a558f87135a2 prerequisite-patch-id: 29537da4c8f3c2a97f1350638e9da035cf3e7057 prerequisite-patch-id: 7303a43285dabde083921fe380a294c9026ef6e9 prerequisite-patch-id: bc42b0633cf33b2a693cc17534a6090f3813cc60
