A v3d core is able to expose a single set of HW performance counters, so at
any moment at most one perfmon can be programmed in HW. Currently, the
driver tracks the active perfmon with v3d_dev->active_perfmon, but three
long-standing issues makes perfmon handling unreliable:

1. The active_perfmon pointer is accessed lock-free from scheduler
   callbacks, the GPU-reset path and the perfmon ioctls. Note that the
   v3d_perfmon->lock mutex serialized start/stop of one perfmon object
   against itself, but the invariant that needs protection is device-wide.

2. perfmon start/stop is hooked exclusively to run_job() callbacks via
   v3d_switch_perfmon(). If nothing is queued behind a perfmon-carrying
   job, the perfmon is never actually stopped.

3. A non-global perfmon should count events generated by a specific
   submission, but the scheduler can run jobs from different queues
   concurrently. Without explicit cross-queue serialization, an unrelated
   job running in parallel pollutes the counters and produces unusable
   results.

This series aims to address all three issues.

PATCH 1 is a minimal, stable-targeted fix for a separate problem: the
SET_GLOBAL ioctl leaks the perfmon reference on several paths. It is kept
self-contained so it can be backported on its own.

PATCH 2 moves the locking to where the invariant actually lives (fixing
issue #1) and replaces the sleeping mutex with a spinlock, which allows us
to stop the perfmon from the IRQ handler at job-completion time (the
natural boundary for "active perfmon follows the active job") and fixes
issue #2.

PATCH 3 addresses issue #3 by building on the new locking to enforce
cross-queue serialization when a non-global perfmon is attached, by adding
scheduler fence dependencies during submission. The fence dependencies
allow us to enforce two rules:

1. A job that carries a non-global perfmon waits for every job currently
   in-flight across all HW queues to finish.

2. While such a job is in-flight, any subsequently submitted job waits
   for it.

This allows us to ensure cross-queue isolation and the reliability of
the performance counter values.

PATCH 4 is a cleanup that drops the now-redundant queue argument from
v3d_job_add_syncobjs(), as struct v3d_job carries its submission queue
after PATCH 3.

To make sure that this series actually produces the expected results and
improves the overall reliability of v3d's performance monitors, this
series is accompanied by a IGT series [1], which was already merged.

This series depends on [2].

[1] 
https://lore.kernel.org/igt-dev/[email protected]/T/
[2] 
https://lore.kernel.org/dri-devel/[email protected]/T/

Best regards,
- Maíra

---
v1 -> v2: 
https://lore.kernel.org/r/[email protected]

- Rebased on top of "[PATCH v2 00/14] drm/v3d: Scheduler and submission
  fixes and refactoring"
- [1/4] NEW PATCH: "drm/v3d: Fix global performance monitor reference counting"
        - Minimal patch for stable branches only fixing the reference leaks
          in global perfmons.
- [2/4] Start/stop the global perfmon inside the set_global_perfmon ioctl and
        simplify global perfmon management across the helpers (Iago Toral)
        - In the reset path, before stopping the perfmon for the HW reset,
          v3d_reset() now re-arms the global perfmon with v3d_perfmon_resume(),
          as the global perfmon's start/stop points live only in the IOCTL.
        - v3d_perfmon_get_values_ioctl() no longer stops the perfmon, it
          only captures the values. Lifecycle management is left to the job
          (per-job perfmons) or the SET_GLOBAL ioctl (global perfmon).
- [2/4] In v3d_perfmon_delete(), first, stop the perfmon and then, check if
        it's a global perfmon (Iago Toral)
- [2/4] Add some comments to explain the refcount logic for global perfmons
        (Iago Toral)
- [3/4] Move the job->queue introduction to this patch instead of the
        previous one.
- [4/4] NEW PATCH: "drm/v3d: Drop the queue argument from 
v3d_job_add_syncobjs()"

---
Maíra Canal (4):
      drm/v3d: Fix global performance monitor reference counting
      drm/v3d: Refactor perfmon locking
      drm/v3d: Serialize jobs across queues when a perfmon is attached
      drm/v3d: Drop the queue argument from v3d_job_add_syncobjs()

 drivers/gpu/drm/v3d/v3d_drv.h     |  50 ++++++++--
 drivers/gpu/drm/v3d/v3d_gem.c     |   7 +-
 drivers/gpu/drm/v3d/v3d_irq.c     |   7 +-
 drivers/gpu/drm/v3d/v3d_perfmon.c | 195 +++++++++++++++++++++++++++++---------
 drivers/gpu/drm/v3d/v3d_power.c   |   4 +
 drivers/gpu/drm/v3d/v3d_sched.c   |  26 +----
 drivers/gpu/drm/v3d/v3d_submit.c  |  85 ++++++++++++++---
 7 files changed, 282 insertions(+), 92 deletions(-)
---
base-commit: 4c26e162947f91aa78ba57dd4fddd38fc80e7d60
change-id: 20260505-v3d-perfmon-lifetime-48c9ded1091b
prerequisite-change-id: 20260407-v3d-sched-misc-fixes-623739017e53:v2
prerequisite-patch-id: 01823e165a822ddec72b0b18e49c096d35149e9a
prerequisite-patch-id: 1df1c11ec62617336e2ed5445c24bbb912570035
prerequisite-patch-id: 738852cd3115283b43cef336ba8fe88616f28a88
prerequisite-patch-id: e1fa04bb45b0c1eb1478b2893b0dedcb6a825255
prerequisite-patch-id: 32c804d9921bcf259b236b3f1d74f7972aec02f2
prerequisite-patch-id: b1b437650405dd43ed324f9c02a0e591a793aec5
prerequisite-patch-id: f13898126dac8b6f14d8e1fba8804123f012889e
prerequisite-patch-id: e03f525b4491a3475ed0efa68bc8049f92be0bd0
prerequisite-patch-id: 971ac9100d4958aa2c0da2d734533eb9ea9d80b3
prerequisite-patch-id: 9d3d40ef80c032c0c05b058f6c397d04d1879236
prerequisite-patch-id: fb2e7a320a7ec3df75b74cfa6683a558f87135a2
prerequisite-patch-id: 29537da4c8f3c2a97f1350638e9da035cf3e7057
prerequisite-patch-id: 7303a43285dabde083921fe380a294c9026ef6e9
prerequisite-patch-id: bc42b0633cf33b2a693cc17534a6090f3813cc60

Reply via email to