On 08/05/2025 07:29, Matthew Brost wrote:
On Fri, May 02, 2025 at 01:32:33PM +0100, Tvrtko Ursulin wrote:
Hi all,
This is another respin of this old work^1 but this version is a total rewrite
and completely changes how the control is done.
This time round the work builds upon the "fair" DRM scheduler work I have posted
recently^2. I am including those patches for completeness and because there were
some tweaks there.
-> It also means people only interested into the cgroup portion probably only
need to look at the last seven patches.
And of those seven the last one is an example how a DRM scheduler based DRM
driver can be wired up with the cgroup controller. So it is quite simple.
To illustrate the runtime effects I ran the Unigine Heaven benchmark in
parallel with the deferredmultisampling Vulkan demo, each in its own cgroup.
First the scheduling weights were the default 100 and 100 respectively, and we
look at the GPU utilisation:
https://people.igalia.com/tursulin/drmcgroup-100-100.png
It is about equal or therabout since it oscillates at runtime as the benchmark
scenes change.
Then we change drm.weight of the deferredmultisampling cgroup to 1:
https://people.igalia.com/tursulin/drmcgroup-100-1.png
There we see around 75:25 in favour of Unigine Heaven. (Although it also
oscillates as explained above).
Important to note is that with GPUs the control is still not nowhere as precise
and accurate as with the CPU controller and that the fair scheduler is work in
progress. But it works and looks useful.
Going into the implementation, in this version it is much simpler than before
since the mechanism of time budgets and over-budget singalling is completely
gone and replaced with notifying clients directly about their assigned relative
scheduling weights.
This connects really nicely with the fair DRM scheduler RFC since we can simply
mix in the scheduling weight with the existing scheduling entity priority based
runtime to vruntime scaling factors.
It also means there is much less code in the controller itself.
Another advantage is that it is really easy to wire up individual drivers which
use the DRM scheduler in the hardware scheduling mode (ie. not 1:1 firmware
scheduling).
Admittedly, I just scanned the series—so it might be easier for you to
elaborate on the above point.
With hardware scheduling mode, the DRM scheduler is essentially just a
dependency tracker that hands off scheduling to the hardware. Are you
suggesting that this series doesn't affect that mode, or does it have
some impact on hardware scheduling (e.g., holding back jobs with
resolved dependencies in the KMD)?
No effect on 1:1 drivers.
(Ignoring some perhaps minor effects from "drm/sched: Queue all free
credits in one worker invocation", or micro effects from removing the
run-queues.)
Follow-up question: aren't most modern drivers and hardware trending
toward hardware scheduling mode? If so, what is the motivation for
making such large changes?
If you are asking for the "fair" scheduler itself, that is covered in
the cover letter for the respective series (benchmark data included).
Goal is to simplify the code base and make it schedule at least as good
if not better for all the drivers which are not 1:1.
If you are asking about the cgroup controller (this combined series),
the motivation is in the previous cover letter linked from this one.
There is currently no way to externally control DRM scheduling
priorities and there are use cases to enable it. For example run
something computational in the background and have it compete less for
the GPU with the foreground tasks. Or wire with the window manager
focused/unfocused window handling to automatically prioritise foreground
tasks via cgroups.
The latter concept was also discussed in the scope of the dmem cgroup
controller for providing some degree of eviction "protection" to the
foreground task. So it all fits nicely into those sort of usage models.
1:1 drivers can still hook into this (as they were able throughout the
life of this RFC). How exactly it would be up to individual firmwares.
This RFC would notify the driver "this client has this relative
scheduling weight" and from there it's up to the driver. Ie. those
drivers wouldn't use the drm_sched_cgroup_notify_weight() helper when
registering with DRM cgroup controller (which this series provides), but
would have to come up with their own.
Regards,
Tvrtko
On the userspace interface side of things it is the same as before. We have
drm.weight as an interface, taking integers from 1 to 10000, the same as CPU and
IO cgroup controllers.
About the use cases, it is the same as before. With this we would be able to run
a workload in the background and make it compete less with the foreground load.
Be it explicitly or when integrating with Desktop Environments some of which
already have cgroup support for tracking foreground vs background windows or
similar.
I would be really interested if people would attempt to try this out, either
directly the amdgpu support as provided in the series, or by wiring up other
drivers.
P.S.
About the CC list. It's a large series so I will put most people on Cc only in
the cover letter as a ping of a sort. Whoever is interested can for now find the
series in the archives.
1)
https://lore.kernel.org/dri-devel/20231024160727.282960-1-tvrtko.ursu...@linux.intel.com/
2)
https://lore.kernel.org/dri-devel/20250425102034.85133-1-tvrtko.ursu...@igalia.com/
Cc: Christian König <christian.koe...@amd.com>
Cc: Danilo Krummrich <d...@kernel.org>
CC: Leo Liu <leo....@amd.com>
Cc: Maíra Canal <mca...@igalia.com>
Cc: Matthew Brost <matthew.br...@intel.com>
Cc: Michal Koutný <mkou...@suse.com>
Cc: Michel Dänzer <michel.daen...@mailbox.org>
Cc: Philipp Stanner <pha...@kernel.org>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-pra...@amd.com>
Cc: Rob Clark <robdcl...@gmail.com>
Cc: Tejun Heo <t...@kernel.org>
Tvrtko Ursulin (23):
drm/sched: Add some scheduling quality unit tests
drm/sched: Add some more scheduling quality unit tests
drm/sched: De-clutter drm_sched_init
drm/sched: Avoid double re-lock on the job free path
drm/sched: Consolidate drm_sched_job_timedout
drm/sched: Consolidate drm_sched_rq_select_entity_rr
drm/sched: Implement RR via FIFO
drm/sched: Consolidate entity run queue management
drm/sched: Move run queue related code into a separate file
drm/sched: Free all finished jobs at once
drm/sched: Account entity GPU time
drm/sched: Remove idle entity from tree
drm/sched: Add fair scheduling policy
drm/sched: Remove FIFO and RR and simplify to a single run queue
drm/sched: Queue all free credits in one worker invocation
drm/sched: Embed run queue singleton into the scheduler
cgroup: Add the DRM cgroup controller
cgroup/drm: Track DRM clients per cgroup
cgroup/drm: Add scheduling weight callback
cgroup/drm: Introduce weight based scheduling control
drm/sched: Add helper for tracking entities per client
drm/sched: Add helper for DRM cgroup controller weight notifications
drm/amdgpu: Register with the DRM scheduling cgroup controller
Documentation/admin-guide/cgroup-v2.rst | 22 +
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 13 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 27 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 5 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 8 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 8 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 8 +-
drivers/gpu/drm/drm_file.c | 11 +
drivers/gpu/drm/scheduler/Makefile | 2 +-
drivers/gpu/drm/scheduler/sched_entity.c | 158 ++--
drivers/gpu/drm/scheduler/sched_fence.c | 2 +-
drivers/gpu/drm/scheduler/sched_internal.h | 126 ++-
drivers/gpu/drm/scheduler/sched_main.c | 570 +++---------
drivers/gpu/drm/scheduler/sched_rq.c | 214 +++++
drivers/gpu/drm/scheduler/tests/Makefile | 3 +-
.../gpu/drm/scheduler/tests/tests_scheduler.c | 815 ++++++++++++++++++
include/drm/drm_drv.h | 26 +
include/drm/drm_file.h | 11 +
include/drm/gpu_scheduler.h | 68 +-
include/linux/cgroup_drm.h | 29 +
include/linux/cgroup_subsys.h | 4 +
init/Kconfig | 5 +
kernel/cgroup/Makefile | 1 +
kernel/cgroup/drm.c | 446 ++++++++++
27 files changed, 2024 insertions(+), 574 deletions(-)
create mode 100644 drivers/gpu/drm/scheduler/sched_rq.c
create mode 100644 drivers/gpu/drm/scheduler/tests/tests_scheduler.c
create mode 100644 include/linux/cgroup_drm.h
create mode 100644 kernel/cgroup/drm.c
--
2.48.0