Re: [RFC v8 00/21] DRM scheduling cgroup controller

Danilo Krummrich Fri, 17 Oct 2025 14:20:41 -0700

On Wed Sep 3, 2025 at 5:23 PM CEST, Tvrtko Ursulin wrote:
> This is another respin of this old work^1 which since v7 is a total rewrite 
> and
> completely changes how the control is done.


I only got some of the patches of the series, can you please send all of them
for subsequent submissions? You may also want to consider resending if you're
not getting a lot of feedback due to that. :)

> On the userspace interface side of things it is the same as before. We have
> drm.weight as an interface, taking integers from 1 to 10000, the same as CPU 
> and
> IO cgroup controllers.

In general, I think it would be good to get GPU vendors to speak up to what kind
of interfaces they're heading to with firmware schedulers and potential firmware
APIs to control scheduling; especially given that this will be a uAPI.

(Adding a couple of folks to Cc.)

Having that said, I think the basic drm.weight interface is fine and should work
in any case; i.e. with the existing DRM GPU scheduler in both modes, the
upcoming DRM Jobqueue efforts and should be generic enough to work with
potential firmware interfaces we may see in the future.

Philipp should be talking about the DRM Jobqueue component at XDC (probably just
in this moment).

--

Some more thoughts on the DRM Jobqueue and scheduling:

The idea behind the DRM Jobqueue is to be, as the name suggests, a component
that receives jobs from userspace, handles the dependencies (i.e. dma fences),
and executes the job, e.g. by writing to a firmware managed software ring.

It basically does what the GPU scheduler does in 1:1 entity-scheduler mode,
just without all the additional complexity of moving job ownership from one
component to another (i.e. from entity to scheduler, etc.).

With just that, there is no scheduling outside the GPU's firmware scheduler of
course. However, additional scheduler capabilities, e.g. to support hardware
rings, or manage firmware schedulers that only support a limited number of
software rings (like some Mali GPUs), can be layered on top of that:

In contrast to the existing GPU scheduler, the idea would be to keep letting the
DRM Jobqueue handle jobs submitted by userspace from end to end (i.e. let the
push to the hardware (or software) ring buffer), but have an additional
component, whose only purpose is to orchestrate the DRM Jobqueues, by managing
when they are allowed to push to a ring and which ring they should push to.

This way we get rid of one of the issue that the existing GPU scheduler moves
job ownership between components of different lifetimes (entity and scheduler),
which is one of the fundamental hassles to deal with.

Re: [RFC v8 00/21] DRM scheduling cgroup controller

Reply via email to