On Mon, Nov 17, 2025 at 11:57:44AM -0800, Niranjana Vishwanathapura wrote:
> On Thu, Oct 16, 2025 at 01:48:21PM -0700, Matthew Brost wrote:
> > Add helpers to see if scheduler is stopped and a jobs signaled state.
> > Expected to be used driver side on recovery and debug flows.
> > 
> > Signed-off-by: Matthew Brost <[email protected]>
> > ---
> > drivers/gpu/drm/scheduler/sched_main.c |  4 ++--
> > include/drm/gpu_scheduler.h            | 32 ++++++++++++++++++++++++--
> > 2 files changed, 32 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 46119aacb809..69bd6e482268 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -344,7 +344,7 @@ drm_sched_rq_select_entity_fifo(struct 
> > drm_gpu_scheduler *sched,
> >  */
> > static void drm_sched_run_job_queue(struct drm_gpu_scheduler *sched)
> > {
> > -   if (!READ_ONCE(sched->pause_submit))
> > +   if (!drm_sched_is_stopped(sched))
> >             queue_work(sched->submit_wq, &sched->work_run_job);
> > }
> > 
> > @@ -354,7 +354,7 @@ static void drm_sched_run_job_queue(struct 
> > drm_gpu_scheduler *sched)
> >  */
> > static void drm_sched_run_free_queue(struct drm_gpu_scheduler *sched)
> > {
> > -   if (!READ_ONCE(sched->pause_submit))
> > +   if (!drm_sched_is_stopped(sched))
> >             queue_work(sched->submit_wq, &sched->work_free_job);
> > }
> > 
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 7f31eba3bd61..d1a2d7f61c1d 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -700,6 +700,17 @@ void drm_sched_entity_modify_sched(struct 
> > drm_sched_entity *entity,
> > 
> > /* Inlines */
> > 
> > +/**
> > + * drm_sched_is_stopped() - DRM is stopped
> > + * @sched: DRM scheduler
> > + *
> > + * Return: True if sched is stopped, False otherwise
> > + */
> > +static inline bool drm_sched_is_stopped(struct drm_gpu_scheduler *sched)
> > +{
> > +   return READ_ONCE(sched->pause_submit);
> > +}
> > +
> > /**
> >  * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator 
> > state
> >  * @sched: DRM scheduler associated with pending job iterator
> > @@ -716,7 +727,7 @@ __drm_sched_pending_job_iter_begin(struct 
> > drm_gpu_scheduler *sched)
> >             .sched = sched,
> >     };
> > 
> > -   WARN_ON(!READ_ONCE(sched->pause_submit));
> > +   WARN_ON(!drm_sched_is_stopped(sched));
> >     return iter;
> > }
> 
> NIT...instead of modifying the functions added in previous patch, may be this
> patch should go in first and the previous patch can be added after that with
> drm_sched_is_stopped() usage?
> 

Yes, I think that would be better ordering. Will fix.

> > 
> > @@ -724,7 +735,7 @@ __drm_sched_pending_job_iter_begin(struct 
> > drm_gpu_scheduler *sched)
> > static inline void
> > __drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter 
> > iter)
> > {
> > -   WARN_ON(!READ_ONCE(iter.sched->pause_submit));
> > +   WARN_ON(!drm_sched_is_stopped(iter.sched));
> > }
> > 
> > DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
> > @@ -750,4 +761,21 @@ 
> > class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t
> >  *_T
> >             list_for_each_entry((__job), &(__sched)->pending_list, list)    
> > \
> >                     for_each_if(!(__entity) || (__job)->entity == 
> > (__entity))
> > 
> > +/**
> > + * drm_sched_job_is_signaled() - DRM scheduler job is signaled
> > + * @job: DRM scheduler job
> > + *
> > + * Determine if DRM scheduler job is signaled. DRM scheduler should be 
> > stopped
> > + * to obtain a stable snapshot of state.
> > + *
> > + * Return: True if job is signaled, False otherwise
> > + */
> > +static inline bool drm_sched_job_is_signaled(struct drm_sched_job *job)
> > +{
> > +   struct drm_sched_fence *s_fence = job->s_fence;
> > +
> > +   WARN_ON(!drm_sched_is_stopped(job->sched));
> > +   return dma_fence_is_signaled(&s_fence->finished);
> > +}
> 
> NIT..In patch#4 where xe driver uses this function in couple places,
> I am seeing originally it checks if the s_fence->parent is signaled
> instead of &s_fence->finished as done here.
> I do see below message in the 's_fence->parent' kernel-doc,
> "We signal the &drm_sched_fence.finished fence once parent is signalled."
> So, probably it is fine, but just want to ensure.
> 

It more or less is the same check. Techincally the hardware fence
(parent) can signal before software fence (finished) but it is pretty
small race window which practice should never be hit but I could make
this function more robust can check on the parent fence too, that is
probably better I guess. Let me change this.

Matt

> Niranjana
> 
> > +
> > #endif
> > -- 
> > 2.34.1
> > 

Reply via email to