On Tue, May 5, 2026 at 1:46 PM Maíra Canal <[email protected]> wrote:
>
> Hi Rob,

Thanks for the review Maíra!

> On 06/03/26 13:36, Rob Herring (Arm) wrote:
> > The Arm Ethos-U NPUs have a PMU with performance counters. The PMU h/w
> > supports up to 4 (U65) or 8 (U85) counters which can be programmed for
> > different events. There is also a dedicated cycle counter.
> >
> > The ABI and implementation are copied from the V3D driver. The main
>
> I'm planning to send a series fixing some perfmon locking issues in the
> V3D driver later this week. But, to give you a brief summary of the main
> issues:
>
> 1. The active perfmon is not protected against races (run_job() vs.
> IOCTLs).
>
> 2. ethosu_switch_perfmon() only starts/stops the perfmon in run_job().
> This means that if nothing is further queued up, a perfmon will never
> actually be stopped.

For ethosu, other than idle and cycle counts, nothing else counts when
a job is done. Idle is just the time from enabling the counter to
start a job and then from the job stopping to reading the counters.
IOW, the idle count while a job is running is 0 and
idle+active=cycles.


> More comments inline:
>
> > difference in the ABI is there is no query API for the the event list.
> > The events differ between the U65 and U85, so the events lists are
> > maintained in userspace along with other differences between the U65 and
> > U85.
> >
> > The cycle counter is always enabled when the PMU is enabled. When the
> > user requests N events, reading the counters will return the N events
> > plus the cycle counter.
> >
> > Signed-off-by: Rob Herring (Arm) <[email protected]>
> > ---
> > MR for mesa with initial support is here:
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40269
> >
> >   drivers/accel/ethosu/Makefile         |   2 +-
> >   drivers/accel/ethosu/ethosu_device.h  |  21 ++
> >   drivers/accel/ethosu/ethosu_drv.c     |  17 +-
> >   drivers/accel/ethosu/ethosu_drv.h     |  62 ++++++
> >   drivers/accel/ethosu/ethosu_job.c     |  35 +++-
> >   drivers/accel/ethosu/ethosu_job.h     |   2 +
> >   drivers/accel/ethosu/ethosu_perfmon.c | 282 ++++++++++++++++++++++++++
> >   include/uapi/drm/ethosu_accel.h       |  59 +++++-
> >   8 files changed, 469 insertions(+), 11 deletions(-)
> >   create mode 100644 drivers/accel/ethosu/ethosu_perfmon.c
> >
>
> [...]
>
> > diff --git a/drivers/accel/ethosu/ethosu_drv.h 
> > b/drivers/accel/ethosu/ethosu_drv.h
> > index 9e21dfe94184..57eee79e4b83 100644
> > --- a/drivers/accel/ethosu/ethosu_drv.h
> > +++ b/drivers/accel/ethosu/ethosu_drv.h
> > @@ -3,13 +3,75 @@
> >   #ifndef __ETHOSU_DRV_H__
> >   #define __ETHOSU_DRV_H__
> >
> > +#include <linux/idr.h>
> > +#include <linux/mutex.h>
> >   #include <drm/gpu_scheduler.h>
> >
> >   struct ethosu_device;
> > +struct drm_device;
> > +struct drm_file;
> >
> >   struct ethosu_file_priv {
> >       struct ethosu_device *edev;
> >       struct drm_sched_entity sched_entity;
> > +
> > +     struct {
> > +             struct idr idr;
>
> You may want to consider moving to XArray as we did in V3D [1].

Okay. I'll have to study that some. I still don't have my head around XArray...

>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a5b0d095bcdb219348ed8ae1c97ee99fc4913b8
>
> > +             struct mutex lock;
> > +     } perfmon;
> > +};
> > +
> > +/* Performance monitor object. The perform lifetime is controlled by 
> > userspace
> > + * using perfmon related ioctls. A perfmon can be attached to a submit_cl
> > + * request, and when this is the case, HW perf counters will be activated 
> > just
> > + * before the submit_cl is submitted to the GPU and disabled when the job 
> > is
> > + * done. This way, only events related to a specific job will be counted.
> > + */
> > +struct ethosu_perfmon {
> > +     /* Tracks the number of users of the perfmon, when this counter 
> > reaches
> > +      * zero the perfmon is destroyed.
> > +      */
> > +     refcount_t refcnt;
> > +
> > +     /* Protects perfmon stop, as it can be invoked from multiple places. 
> > */
> > +     struct mutex lock;
>
> I'm not familiar with the Ethos hardware, but *if* it has only a single
> performance monitor active in HW at any given moment (like v3d), this
> lock doesn't enforce this restriction.
>
> This lock only serializes start/stop of *one* perfmon object against
> itself, but the invariant (at least on v3d) that needs protection is
> device-wide: there is exactly one active perfmon at any moment in HW.

You mean V3D has 1 perfmon shared among its 3? h/w queues? We only
have 1 queue. We only touch the perfmon when we touch the h/w to
run/end a job, so the same serialization for the job accesses should
cover the PMU accesses.

Rob

Reply via email to