Re: [PATCH] drm/sched: Remove racy hack from drm_sched_fini()

Philipp Stanner Tue, 17 Feb 2026 02:25:43 -0800

On Thu, 2026-01-08 at 09:30 +0100, Philipp Stanner wrote:
> drm_sched_fini() contained a hack to work around a race in amdgpu.
> According to AMD, the hack should not be necessary anymore. In case
> there should have been undetected users,
> 
> commit 975ca62a014c ("drm/sched: Add warning for removing hack in 
> drm_sched_fini()")
> 
> had added a warning one release cycle ago.
> 
> Thus, it can be derived that the hack can be savely removed by now.
> 
> Remove the hack.
> 
> Signed-off-by: Philipp Stanner <[email protected]>
> ---
> As hinted at in the commit, I want to cozyly queue this one up for the
> next merge window, since we're printing that warning since last merge
> window already.
> 
> If someone has concerns I'm also happy to delay this patch for a few
> more releases.
> ---


Any objections by anyone?

Can I get an RB?


P.

>  drivers/gpu/drm/scheduler/sched_main.c | 38 +-------------------------
>  1 file changed, 1 insertion(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 1d4f1b822e7b..381c1694a12e 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1416,48 +1416,12 @@ static void drm_sched_cancel_remaining_jobs(struct 
> drm_gpu_scheduler *sched)
>   */
>  void drm_sched_fini(struct drm_gpu_scheduler *sched)
>  {
> -     struct drm_sched_entity *s_entity;
>       int i;
>  
>       drm_sched_wqueue_stop(sched);
>  
> -     for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) {
> -             struct drm_sched_rq *rq = sched->sched_rq[i];
> -
> -             spin_lock(&rq->lock);
> -             list_for_each_entry(s_entity, &rq->entities, list) {
> -                     /*
> -                      * Prevents reinsertion and marks job_queue as idle,
> -                      * it will be removed from the rq in 
> drm_sched_entity_fini()
> -                      * eventually
> -                      *
> -                      * FIXME:
> -                      * This lacks the proper spin_lock(&s_entity->lock) and
> -                      * is, therefore, a race condition. Most notably, it
> -                      * can race with drm_sched_entity_push_job(). The lock
> -                      * cannot be taken here, however, because this would
> -                      * lead to lock inversion -> deadlock.
> -                      *
> -                      * The best solution probably is to enforce the life
> -                      * time rule of all entities having to be torn down
> -                      * before their scheduler. Then, however, locking could
> -                      * be dropped alltogether from this function.
> -                      *
> -                      * For now, this remains a potential race in all
> -                      * drivers that keep entities alive for longer than
> -                      * the scheduler.
> -                      *
> -                      * The READ_ONCE() is there to make the lockless read
> -                      * (warning about the lockless write below) slightly
> -                      * less broken...
> -                      */
> -                     if (!READ_ONCE(s_entity->stopped))
> -                             dev_warn(sched->dev, "Tearing down scheduler 
> with active entities!\n");
> -                     s_entity->stopped = true;
> -             }
> -             spin_unlock(&rq->lock);
> +     for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++)
>               kfree(sched->sched_rq[i]);
> -     }
>  
>       /* Wakeup everyone stuck in drm_sched_entity_flush for this scheduler */
>       wake_up_all(&sched->job_scheduled);

Re: [PATCH] drm/sched: Remove racy hack from drm_sched_fini()

Reply via email to