On Tue, Nov 18, 2025 at 09:52:40AM -0800, Matthew Brost wrote:
On Fri, Nov 14, 2025 at 05:25:47PM -0800, Niranjana Vishwanathapura wrote:
On Thu, Oct 16, 2025 at 01:48:20PM -0700, Matthew Brost wrote:
> Stop open coding pending job list in drivers. Add pending job list
> iterator which safely walks DRM scheduler list asserting DRM scheduler
> is stopped.
>
> v2:
> - Fix checkpatch (CI)
> v3:
> - Drop locked version (Christian)
>
> Signed-off-by: Matthew Brost <[email protected]>
> ---
> include/drm/gpu_scheduler.h | 52 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 52 insertions(+)
>
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index fb88301b3c45..7f31eba3bd61 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -698,4 +698,56 @@ void drm_sched_entity_modify_sched(struct 
drm_sched_entity *entity,
>                               struct drm_gpu_scheduler **sched_list,
>                               unsigned int num_sched_list);
>
> +/* Inlines */
> +
> +/**
> + * struct drm_sched_pending_job_iter - DRM scheduler pending job iterator 
state
> + * @sched: DRM scheduler associated with pending job iterator
> + */
> +struct drm_sched_pending_job_iter {
> +  struct drm_gpu_scheduler *sched;
> +};
> +
> +/* Drivers should never call this directly */
> +static inline struct drm_sched_pending_job_iter
> +__drm_sched_pending_job_iter_begin(struct drm_gpu_scheduler *sched)
> +{
> +  struct drm_sched_pending_job_iter iter = {
> +          .sched = sched,
> +  };
> +
> +  WARN_ON(!READ_ONCE(sched->pause_submit));
> +  return iter;
> +}
> +
> +/* Drivers should never call this directly */
> +static inline void
> +__drm_sched_pending_job_iter_end(const struct drm_sched_pending_job_iter 
iter)
> +{
> +  WARN_ON(!READ_ONCE(iter.sched->pause_submit));
> +}

May be instead of these inline functions, we can add the code in a '({' block
in the below DEFINE_CLASS itself to avoid drivers from calling these inline
funcions? Though I agree these inline functions makes it cleaner to read.


I'm not sure we can just call code inline from DEFINE_CLASS, rather only
functions.

I do see some examples of it.
https://elixir.bootlin.com/linux/v6.18-rc6/source/drivers/gpu/drm/xe/xe_validation.h#L167
https://elixir.bootlin.com/linux/v6.18-rc6/source/drivers/gpio/gpiolib.h#L229

But DEFINE_CLASS also inserts static inline functions here. So, not super 
critical.


> +
> +DEFINE_CLASS(drm_sched_pending_job_iter, struct drm_sched_pending_job_iter,
> +       __drm_sched_pending_job_iter_end(_T),
> +       __drm_sched_pending_job_iter_begin(__sched),
> +       struct drm_gpu_scheduler *__sched);
> +static inline void *
> +class_drm_sched_pending_job_iter_lock_ptr(class_drm_sched_pending_job_iter_t 
*_T)
> +{ return _T; }
> +#define class_drm_sched_pending_job_iter_is_conditional false
> +
> +/**
> + * drm_sched_for_each_pending_job() - Iterator for each pending job in 
scheduler
> + * @__job: Current pending job being iterated over
> + * @__sched: DRM scheduler to iterate over pending jobs
> + * @__entity: DRM scheduler entity to filter jobs, NULL indicates no filter
> + *
> + * Iterator for each pending job in scheduler, filtering on an entity, and
> + * enforcing scheduler is fully stopped
> + */
> +#define drm_sched_for_each_pending_job(__job, __sched, __entity)          \
> +  scoped_guard(drm_sched_pending_job_iter, (__sched))                     \
> +          list_for_each_entry((__job), &(__sched)->pending_list, list)     \
> +                  for_each_if(!(__entity) || (__job)->entity == (__entity))
> +

I am comparing it with DEFINE_CLASS usage in ttm driver here.
It looks like the body of this macro (where we call list_for_each_entry()),
doesn't use the drm_sched_pending_job_iter at all. So, looks like the only
reason we are using a DEFINE_CLASS with scoped_guard here is for those
WARN_ON() messages at the beginning and end of loop iteration, which is not
fully fool proof. Right?

The drm_sched_pending_job_iter is for futuring proofing (e.g., if we
need more information than drm_gpu_scheduler, we have iterator
structure).

The define class is purpose is to ensure at the start of iterator and
end of the iterator the scheduler is paused which is only time we (DRM
scheduler maintainers) have agreed it is safe for driver to look at the
pending list. FWIW, this caught some bugs in Xe VF restore
implementation.

I wonder if we really need DEFINE_CLASS here for that, though I am not
against using it.


So yes, I think a DEFINE_CLASS is apporiate here to implement the
iterator.


Ok, thanks.

Nianjana

Matt

Niranjana

> #endif
> --
> 2.34.1
>

Reply via email to