sched: Prevent teardown waitque from blocking too long

Tvrtko Ursulin Fri, 16 May 2025 03:34:19 -0700


On 16/05/2025 10:54, Philipp Stanner wrote:

On Fri, 2025-05-16 at 10:33 +0100, Tvrtko Ursulin wrote:


On 24/04/2025 10:55, Philipp Stanner wrote:

The waitqueue that ensures that drm_sched_fini() blocks until the
pending_list has become empty could theoretically cause that
function to
block for a very long time. That, ultimately, could block userspace
procesess and prevent them from being killable through SIGKILL.

When a driver calls drm_sched_fini(), it is safe to assume that all
still pending jobs are not needed anymore anyways. Thus, they can
be
cancelled and thereby it can be ensured that drm_sched_fini() will
return relatively quickly.

Implement a new helper to stop all work items / submission except
for
the drm_sched_backend_ops.run_job().

Implement a driver callback, kill_fence_context(), that instructs
the
driver to kill the fence context associated with this scheduler,
thereby
causing all pending hardware fences to be signalled.

Call those new routines in drm_sched_fini() and ensure backwards
compatibility if the new callback is not implemented.

Suggested-by: Danilo Krummrich <d...@redhat.com>
Signed-off-by: Philipp Stanner <pha...@kernel.org>
---
   drivers/gpu/drm/scheduler/sched_main.c | 47 +++++++++++++++++----
-----
   include/drm/gpu_scheduler.h            | 11 ++++++
   2 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler/sched_main.c
index ea82e69a72a8..c2ad6c70bfb6 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1390,31 +1390,46 @@ drm_sched_no_jobs_pending(struct
drm_gpu_scheduler *sched)
        return empty;
   }

+/**

+ * drm_sched_cancel_jobs_and_wait - trigger freeing of all pending
jobs
+ * @sched: scheduler instance
+ *
+ * Must only be called if &struct
drm_sched_backend_ops.kill_fence_context is
+ * implemented.
+ *
+ * Instructs the driver to kill the fence context associated with
this scheduler,
+ * thereby signalling all pending fences. This, in turn, will
trigger
+ * &struct drm_sched_backend_ops.free_job to be called for all
pending jobs.
+ * The function then blocks until all pending jobs have been
freed.
+ */
+static inline void
+drm_sched_cancel_jobs_and_wait(struct drm_gpu_scheduler *sched)
+{
+       sched->ops->kill_fence_context(sched);
+       wait_event(sched->pending_list_waitque,
drm_sched_no_jobs_pending(sched));
+}
+
   /**
    * drm_sched_fini - Destroy a gpu scheduler
    *
    * @sched: scheduler instance
    *
- * Tears down and cleans up the scheduler.
- *
- * Note that this function blocks until all the fences returned by
- * &struct drm_sched_backend_ops.run_job have been signalled.
+ * Tears down and cleans up the scheduler. Might leak memory if
+ * &struct drm_sched_backend_ops.kill_fence_context is not
implemented.
    */
   void drm_sched_fini(struct drm_gpu_scheduler *sched)
   {
        struct drm_sched_entity *s_entity;
        int i;

- /*

-        * Jobs that have neither been scheduled or which have
timed out are
-        * gone by now, but jobs that have been submitted through
-        * backend_ops.run_job() and have not yet terminated are
still pending.
-        *
-        * Wait for the pending_list to become empty to avoid
leaking those jobs.
-        */
-       drm_sched_submission_and_timeout_stop(sched);
-       wait_event(sched->pending_list_waitque,
drm_sched_no_jobs_pending(sched));
-       drm_sched_free_stop(sched);
+       if (sched->ops->kill_fence_context) {
+               drm_sched_submission_and_timeout_stop(sched);
+               drm_sched_cancel_jobs_and_wait(sched);
+               drm_sched_free_stop(sched);
+       } else {
+               /* We're in "legacy free-mode" and ignore
potential mem leaks */
+               drm_sched_wqueue_stop(sched);
+       }

for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs;

i++) {
                struct drm_sched_rq *rq = sched->sched_rq[i];
@@ -1502,7 +1517,7 @@ bool drm_sched_wqueue_ready(struct
drm_gpu_scheduler *sched)
   EXPORT_SYMBOL(drm_sched_wqueue_ready);

/**

- * drm_sched_wqueue_stop - stop scheduler submission
+ * drm_sched_wqueue_stop - stop scheduler submission and freeing


Looks like the kerneldoc corrections (below too) belong to the
previous
patch. Irrelevant if you decide to squash them though.

    * @sched: scheduler instance
    *
    * Stops the scheduler from pulling new jobs from entities. It
also stops
@@ -1518,7 +1533,7 @@ void drm_sched_wqueue_stop(struct
drm_gpu_scheduler *sched)
   EXPORT_SYMBOL(drm_sched_wqueue_stop);

/**

- * drm_sched_wqueue_start - start scheduler submission
+ * drm_sched_wqueue_start - start scheduler submission and freeing
    * @sched: scheduler instance
    *
    * Restarts the scheduler after drm_sched_wqueue_stop() has
stopped it.
diff --git a/include/drm/gpu_scheduler.h
b/include/drm/gpu_scheduler.h
index d0b1f416b4d9..8630b4a26f10 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -509,6 +509,17 @@ struct drm_sched_backend_ops {
            * and it's time to clean it up.
         */
        void (*free_job)(struct drm_sched_job *sched_job);
+
+       /**
+        * @kill_fence_context: kill the fence context belonging
to this scheduler


Which fence context would that be? ;)

Also, "fence context" would be a new terminology in gpu_scheduler.h
API
level. You could call it ->sched_fini() or similar to signify at
which
point in the API it gets called and then the fact it takes sched as
parameter would be natural.

We also probably want some commentary on the topic of indefinite (or
very long at least) blocking a thread exit / SIGINT/TERM/KILL time.
Cover letter touches upon that problem but I don't see you address
it.
Is the idea to let drivers shoot themselves in the foot or what?


You didn't seriously just write that, did you?

Yes, my intention clearly has been since September to make things
worse, more ambiguous and destroy drivers. That's why I'm probably the
only individual after Sima (who added some of the FIXMEs) who ever
bothered documenting all this mess here, and warning people about the
five hundred pitfalls, like three different start and stop functions,
implicit refcount rules and God knows which other insane hacks that
simply move driver problems into common infrastructure.

Reconsider your attitude.

I don't know what kind of attitude you think I was expressing. If thelast sentence was a hyperbole too much for you then sorry. Point was inthe paragraph ending with that - to document callback must not block, orif it has to to discuss for how long. And to perhaps discuss ifscheduler code should impose a limit. Otherwise the indefinite block onthread exit from the cover letter can still happen. I don't see raisingthose point is a criticism of your overall work. (Fact that we don't seeeye to eye regarding more state machine vs the ->cancel_job()alternative aside.)


Regards,

Tvrtko

P.


Regards,

Tvrtko

+        *
+        * Needed to cleanly tear the scheduler down in
drm_sched_fini(). This
+        * callback will cause all hardware fences to be signalled
by the driver,
+        * which, ultimately, ensures that all jobs get freed
before teardown.
+        *
+        * This callback is optional, but it is highly recommended
to implement it.
+        */
+       void (*kill_fence_context)(struct drm_gpu_scheduler
*sched);
   };

/**

Re: [PATCH v2 2/6] drm/sched: Prevent teardown waitque from blocking too long

Reply via email to