On 02/11/2023 22:46, Luben Tuikov wrote:
Eliminate drm_sched_run_job_queue_if_ready() and instead just call
drm_sched_run_job_queue() in drm_sched_free_job_work(). The problem is that
the former function uses drm_sched_select_entity() to determine if the
scheduler had an entity ready in one of its run-queues, and in the case of the
Round-Robin (RR) scheduling, the function drm_sched_rq_select_entity_rr() does
just that, selects the _next_ entity which is ready, sets up the run-queue and
completion and returns that entity. The FIFO scheduling algorithm is unaffected.

Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in calling select_entity()
twice, which may result in skipping a ready entity if more than one entity is
ready. This commit fixes this by eliminating the if_ready() variant.

Fixes: is missing since the regression already landed.


Signed-off-by: Luben Tuikov <ltuiko...@gmail.com>
---
  drivers/gpu/drm/scheduler/sched_main.c | 14 ++------------
  1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 98b2ad54fc7071..05816e7cae8c8b 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1040,16 +1040,6 @@ drm_sched_pick_best(struct drm_gpu_scheduler 
**sched_list,
  }
  EXPORT_SYMBOL(drm_sched_pick_best);
-/**
- * drm_sched_run_job_queue_if_ready - enqueue run-job work if ready
- * @sched: scheduler instance
- */
-static void drm_sched_run_job_queue_if_ready(struct drm_gpu_scheduler *sched)
-{
-       if (drm_sched_select_entity(sched))
-               drm_sched_run_job_queue(sched);
-}
-
  /**
   * drm_sched_free_job_work - worker to call free_job
   *
@@ -1069,7 +1059,7 @@ static void drm_sched_free_job_work(struct work_struct *w)
                sched->ops->free_job(cleanup_job);
drm_sched_free_job_queue_if_done(sched);
-               drm_sched_run_job_queue_if_ready(sched);
+               drm_sched_run_job_queue(sched);

It works but is a bit wasteful causing needless CPU wake ups with a potentially empty queue, both here and in drm_sched_run_job_work below.

What would be the problem in having a "peek" type helper? It would be easy to do it in a single spin lock section instead of drop and re-acquire.

What is even the point of having the re-queue here _inside_ the if (cleanup_job) block? See https://lists.freedesktop.org/archives/dri-devel/2023-November/429037.html. Because of the lock drop and re-acquire I don't see that it makes sense to make potential re-queue depend on the existence of current finished job.

Also the point of doing the re-queue of the run job queue from the free worker?

(I suppose re-queuing the _free_ worker itself is needed in the current design, albeit inefficient.)

Regards,

Tvrtko

        }
  }
@@ -1127,7 +1117,7 @@ static void drm_sched_run_job_work(struct work_struct *w)
        }
wake_up(&sched->job_scheduled);
-       drm_sched_run_job_queue_if_ready(sched);
+       drm_sched_run_job_queue(sched);
  }
/**

base-commit: 6fd9487147c4f18ad77eea00bd8c9189eec74a3e

Reply via email to