If a job from a ready entity needs more credits than are currently available, drm_sched_run_job_work() (a work item) simply returns and doesn't reschedule itself. The scheduler is only woken up again when the next job gets pushed with drm_sched_entity_push_job().
If someone submits a job that needs too many credits and doesn't submit more jobs afterwards, this would lead to the scheduler never pulling the too-expensive job, effectively hanging forever. Document this problem as a FIXME. Signed-off-by: Philipp Stanner <[email protected]> --- drivers/gpu/drm/scheduler/sched_main.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 492e8af639db..eaf8d17b2a66 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1237,6 +1237,16 @@ static void drm_sched_run_job_work(struct work_struct *w) /* Find entity with a ready job */ entity = drm_sched_select_entity(sched); + /* + * FIXME: + * The entity can be NULL when the scheduler currently has no capacity + * (credits) for more jobs. If that happens, the work item terminates + * itself here, without rescheduling itself. + * + * It only gets started again in drm_sched_entity_push_job(). IOW, the + * scheduler might hang forever if a job that needs too many credits + * gets submitted to an entity and no other, subsequent jobs are. + */ if (!entity) { /* * Either no more work to do, or the next ready job needs more -- 2.49.0
