Hello,
Regarding the proposed interface for gathering GPU driver statistics, I would
like to provide feedback based on the following points:
1.
Providing per-process GPU information through an interface other than /proc
would significantly improve the developer experience for Flatpak-based
applications and tools. Since Flatpak containers have a restricted view of the
host's /proc, it is currently very difficult for sandboxed monitoring tools to
gather cross-process GPU metrics.
2.
The current reliance on /proc/<pid>/{fd,fdinfo} requires root privileges to
access info for other users' processes. This is a major hurdle for non-root
users when attempting to detect system-critical issues, such as VRAM leaks in a
compositor.
3.
While ROCm/amdkfd currently provides per-process VRAM usage, it lacks an
interface to report the utilization of hardware engines such as Compute or
SDMA. It would be highly beneficial if the new interface could address this
gap, ensuring that hardware IP utilization is consistently trackable across
both KFD and DRM nodes.
Note: I've used Gemini to help structure my thoughts and refine the English in
this mail.
Best regards,
Umio Yasuno
>
>
> On 2/5/26 20:25, Natalie Vock wrote:
>
> > On 2/5/26 19:58, Alex Deucher wrote:
> >
> > > Has anyone given any thought on how to support something like top for
> > > accelerators or GPUs?
> >
> > top for accelerators/GPUs kind of exists already, see [1] or [2].
> > Clearly, this problem has some kind of solution (looking through the code,
> > it seems like they check every fd if it has a DRM fdinfo file associated
> > (which is indeed not particularly efficient)).
> >
> > Maybe it's worth asking the authors of the respective tools for their
> > opinions here?
>
>
> That is a really good point. Adding Maxime Schmitt and Umio Yasuno on CC.
>
> Let's hope I've picked the correct mail addresses.
>
> Christian.
>
> > Natalie
> >
> > [1] https://github.com/Umio-Yasuno/amdgpu_top
> > [2] https://github.com/Syllo/nvtop
> >
> > > We have fdinfo, but using fdinfo requires extra
> > > privileges (CAP_SYS_PTRACE) and there is not a particularly efficient
> > > way to even discover what processes are using the GPU. There is the
> > > clients list in debugfs, but that is also admin only. Tools like ps
> > > and top use /proc/<pid>/stat and statm. Do you think there would be
> > > an appetite for something like /proc/<pid>/drm/stat, statm, etc.?
> > > This would duplicate much of what is in fdinfo, but would be available
> > > to regular users.
> > >
> > > Thanks,
> > >
> > > Alex