start() in asic reset

Alex Deucher Thu, 05 Feb 2026 06:27:15 -0800

On Thu, Feb 5, 2026 at 9:22 AM Pierre-Eric Pelloux-Prayer
<[email protected]> wrote:
>
>
>
> Le 30/01/2026 à 18:30, Alex Deucher a écrit :
> > We only want to stop the work queues, not mess with the
> > fences, etc.
> >
> > v2: add the job back to the pending list.
> > v3: return the proper job status so scheduler adds the
> >      job back to the pending list
> >
> > Signed-off-by: Alex Deucher <[email protected]>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c    | 6 ++----
> >   2 files changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index e69ab8a923e31..a5b43d57c7b05 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -6313,7 +6313,7 @@ static void amdgpu_device_halt_activities(struct 
> > amdgpu_device *adev,
> >                       if (!amdgpu_ring_sched_ready(ring))
> >                               continue;
> >
> > -                     drm_sched_stop(&ring->sched, job ? &job->base : NULL);
> > +                     drm_sched_wqueue_stop(&ring->sched);
> >
> >                       if (need_emergency_restart)
> >                               
> > amdgpu_job_stop_all_jobs_on_sched(&ring->sched);
> > @@ -6397,7 +6397,7 @@ static int amdgpu_device_sched_resume(struct 
> > list_head *device_list,
> >                       if (!amdgpu_ring_sched_ready(ring))
> >                               continue;
> >
> > -                     drm_sched_start(&ring->sched, 0);
> > +                     drm_sched_wqueue_start(&ring->sched);
> >               }
> >
> >               if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && 
> > !job_signaled)
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index df06a271bdf99..cd0707737a29b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -92,7 +92,6 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
> > drm_sched_job *s_job)
> >       struct drm_wedge_task_info *info = NULL;
> >       struct amdgpu_task_info *ti = NULL;
> >       struct amdgpu_device *adev = ring->adev;
> > -     enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_RESET;
> >       int idx, r;
> >
> >       if (!drm_dev_enter(adev_to_drm(adev), &idx)) {
> > @@ -147,8 +146,6 @@ static enum drm_gpu_sched_stat 
> > amdgpu_job_timedout(struct drm_sched_job *s_job)
> >                               ring->sched.name);
> >                       drm_dev_wedged_event(adev_to_drm(adev),
> >                                            DRM_WEDGE_RECOVERY_NONE, info);
> > -                     /* This is needed to add the job back to the pending 
> > list */
> > -                     status = DRM_GPU_SCHED_STAT_NO_HANG;
> >                       goto exit;
> >               }
> >               dev_err(adev->dev, "Ring %s reset failed\n", 
> > ring->sched.name);
> > @@ -184,7 +181,8 @@ static enum drm_gpu_sched_stat 
> > amdgpu_job_timedout(struct drm_sched_job *s_job)
> >   exit:
> >       amdgpu_vm_put_task_info(ti);
> >       drm_dev_exit(idx);
> > -     return status;
> > +     /* This is needed to add the job back to the pending list */
> > +     return DRM_GPU_SCHED_STAT_NO_HANG;
>
> This part seems unrelated to the patch and is overwriting what was done
> in patch 1/12.


Patch 1 fixes the pending list handling for per queue resets.  This
patch reworks the adapter reset path to match the behavior of the per
queue reset path.  After this patch they match so we can safely return
DRM_GPU_SCHED_STAT_NO_HANG in both cases.  Previously the adapter
reset path called drm_sched_wqueue_stop()/start() which handles
re-adding the job to the pending list.  Since it no longer does, we
need to return DRM_GPU_SCHED_STAT_NO_HANG for both cases.

Alex

>
> Pierre-Eric
>
>
> >   }
> >
> >   int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,

Re: [PATCH 05/12] drm/amdgpu: don't call drm_sched_stop/start() in asic reset

Reply via email to