On Mon, 15 Dec 2025 17:14:47 +0000
Lukas Zapolskas <[email protected]> wrote:

> This patch extends the DEV_QUERY ioctl to return information about the
> performance counter setup for userspace, and introduces the new
> ioctl DRM_PANTHOR_PERF_CONTROL in order to allow for the sampling of
> performance counters.
> 
> The new design is inspired by the perf aux ringbuffer [0], with the
> insert and extract indices being mapped to userspace, allowing
> multiple samples to be exposed at any given time. To avoid pointer
> chasing, the sample metadata and block metadata are inline with
> the elements they describe.
> 
> Userspace is responsible for passing in resources for samples to be
> exposed, including the event file descriptor for notification of new
> sample availability, the ringbuffer BO to store samples, and the
> control BO along with the offset for mapping the insert and extract
> indices. Though these indices are only a total of 8 bytes, userspace
> can then reuse the same physical page for tracking the state of
> multiple buffers by giving different offsets from the BO start to
> map them.
> 
> [0]: https://docs.kernel.org/userspace-api/perf_ring_buffer.html
> 
> Co-developed-by: Mihail Atanassov <[email protected]>
> Signed-off-by: Mihail Atanassov <[email protected]>
> Signed-off-by: Lukas Zapolskas <[email protected]>
> Reviewed-by: Adrián Larumbe <[email protected]>

A couple things pointed out by Adrian have not been fixed, I think (see
below).

> ---
>  include/uapi/drm/panthor_drm.h | 565 +++++++++++++++++++++++++++++++++
>  1 file changed, 565 insertions(+)
> 
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index e238c6264fa1..d1a92172e878 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h

[...]

> +/**
> + * struct drm_panthor_perf_info - Performance counter interface information
> + *
> + * Structure grouping all queryable information relating to the performance 
> counter
> + * interfaces.
> + */
> +struct drm_panthor_perf_info {
> +     /**
> +      * @counters_per_block: The number of 8-byte counters available in a 
> block.
> +      */
> +     __u32 counters_per_block;
> +
> +     /**
> +      * @sample_header_size: The size of the header struct available at the 
> beginning
> +      * of every sample.
> +      */
> +     __u32 sample_header_size;
> +
> +     /**
> +      * @block_header_size: The size of the header struct inline with the 
> counters for a
> +      * single block.
> +      */
> +     __u32 block_header_size;
> +
> +     /**
> +      * @sample_size: The size of a fully annotated sample, starting with a 
> sample header
> +      *               of size @sample_header_size bytes, and all available 
> blocks for the current
> +      *               configuration, each comprised of @counters_per_block 
> 64-bit counters and
> +      *               a block header of @block_header_size bytes.
> +      *
> +      *               The user must use this field to allocate size for the 
> ring buffer. In
> +      *               the case of new blocks being added, an old userspace 
> can always use
> +      *               this field and ignore any blocks it does not know 
> about.
> +      */
> +     __u32 sample_size;
> +
> +     /** @flags: Combination of drm_panthor_perf_feat_flags flags. */
> +     __u32 flags;
> +
> +     /**
> +      * @supported_clocks: Bitmask of the clocks supported by the GPU.
> +      *
> +      * Each bit represents a variant of the enum drm_panthor_perf_clock.
> +      *
> +      * For the same GPU, different implementers may have different clocks 
> for the same hardware
> +      * block. At the moment, up to three clocks are supported, and any 
> clocks that are present
> +      * will be reported here.
> +      */
> +     __u32 supported_clocks;
> +
> +     /** @fw_blocks: Number of FW blocks available. */
> +     __u32 fw_blocks;
> +
> +     /** @cshw_blocks: Number of CSHW blocks available. */
> +     __u32 cshw_blocks;
> +
> +     /** @tiler_blocks: Number of tiler blocks available. */
> +     __u32 tiler_blocks;
> +
> +     /** @memsys_blocks: Number of memsys blocks available. */
> +     __u32 memsys_blocks;
> +
> +     /** @shader_blocks: Number of shader core blocks available. */
> +     __u32 shader_blocks;

You need an extra

        __u32 pad;

to have things aligned on 8 bytes.

> +};
> +

[...]

> +
> +/**
> + * struct drm_panthor_perf_ringbuf_control - Struct used to map in the ring 
> buffer control indices
> + *                                           into memory shared between user 
> and kernel.
> + *
> + */
> +struct drm_panthor_perf_ringbuf_control {
> +     /**
> +      * @extract_idx: The index of the latest sample that was processed by 
> userspace. Only
> +      *               modifiable by userspace.
> +      */
> +     __u64 extract_idx;
> +
> +     /**
> +      * @insert_idx: The index of the latest sample emitted by the kernel. 
> Only modifiable by
> +      *               modifiable by the kernel.

"modifiable by" repeated twice.

> +      */
> +     __u64 insert_idx;
> +};

Reply via email to