sched: Consolidate drm_sched_job_timedout

Philipp Stanner Mon, 12 May 2025 05:54:15 -0700

On Fri, 2025-04-25 at 11:20 +0100, Tvrtko Ursulin wrote:
> Reduce to one spin_unlock for hopefully a little bit clearer flow in
> the
> function. It may appear that there is a behavioural change with the
> drm_sched_start_timeout_unlocked() now not being called if there were
> initially no jobs on the pending list, and then some appeared after
> unlock, however if the code would rely on the TDR handler restarting
> itself then it would fail to do that if the job arrived on the
> pending
> list after the check.
> 
> Also fix one stale comment while touching the function.


Same here, that's a good candidate for a separate patch / series.

P.

> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@igalia.com>
> Cc: Christian König <christian.koe...@amd.com>
> Cc: Danilo Krummrich <d...@kernel.org>
> Cc: Matthew Brost <matthew.br...@intel.com>
> Cc: Philipp Stanner <pha...@kernel.org>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 37 +++++++++++++-----------
> --
>  1 file changed, 18 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index a45b02fd2af3..a26cc11c8ade 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -516,38 +516,37 @@ static void drm_sched_job_begin(struct
> drm_sched_job *s_job)
>  
>  static void drm_sched_job_timedout(struct work_struct *work)
>  {
> -     struct drm_gpu_scheduler *sched;
> +     struct drm_gpu_scheduler *sched =
> +             container_of(work, struct drm_gpu_scheduler,
> work_tdr.work);
> +     enum drm_gpu_sched_stat status;
>       struct drm_sched_job *job;
> -     enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL;
> -
> -     sched = container_of(work, struct drm_gpu_scheduler,
> work_tdr.work);
>  
>       /* Protects against concurrent deletion in
> drm_sched_get_finished_job */
>       spin_lock(&sched->job_list_lock);
>       job = list_first_entry_or_null(&sched->pending_list,
>                                      struct drm_sched_job, list);
> -
>       if (job) {
>               /*
>                * Remove the bad job so it cannot be freed by
> concurrent
> -              * drm_sched_cleanup_jobs. It will be reinserted
> back after sched->thread
> -              * is parked at which point it's safe.
> +              * drm_sched_get_finished_job. It will be reinserted
> back after
> +              * scheduler worker is stopped at which point it's
> safe.
>                */
>               list_del_init(&job->list);
> -             spin_unlock(&sched->job_list_lock);
> +     }
> +     spin_unlock(&sched->job_list_lock);
>  
> -             status = job->sched->ops->timedout_job(job);
> +     if (!job)
> +             return;
>  
> -             /*
> -              * Guilty job did complete and hence needs to be
> manually removed
> -              * See drm_sched_stop doc.
> -              */
> -             if (sched->free_guilty) {
> -                     job->sched->ops->free_job(job);
> -                     sched->free_guilty = false;
> -             }
> -     } else {
> -             spin_unlock(&sched->job_list_lock);
> +     status = job->sched->ops->timedout_job(job);
> +
> +     /*
> +      * Guilty job did complete and hence needs to be manually
> removed. See
> +      * documentation for drm_sched_stop.
> +      */
> +     if (sched->free_guilty) {
> +             job->sched->ops->free_job(job);
> +             sched->free_guilty = false;
>       }
>  
>       if (status != DRM_GPU_SCHED_STAT_ENODEV)

Re: [RFC v4 05/16] drm/sched: Consolidate drm_sched_job_timedout

Reply via email to