Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

Andrey Grodzovsky Wed, 24 Aug 2022 08:07:33 -0700

On 2022-08-24 04:29, Michel Dänzer wrote:

On 2022-08-22 22:09, Andrey Grodzovsky wrote:

Poblem: Given many entities competing for same rq on
same scheduler an uncceptabliy long wait time for some
jobs waiting stuck in rq before being picked up are
observed (seen using  GPUVis).
The issue is due to Round Robin policy used by scheduler
to pick up the next entity for execution. Under stress
of many entities and long job queus within entity some
jobs could be stack for very long time in it's entity's
queue before being popped from the queue and executed
while for other entites with samller job queues a job
might execute ealier even though that job arrived later
then the job in the long queue.


Fix:
Add FIFO selection policy to entites in RQ, chose next enitity
on rq in such order that if job on one entity arrived
ealrier then job on another entity the first job will start
executing ealier regardless of the length of the entity's job
queue.

Instead of ordering based on when jobs are added, might it be possible to order 
them based on when they become ready to run?

Otherwise it seems possible to e.g. submit a large number of inter-dependent 
jobs at once, and they would all run before any jobs from another queue get a 
chance.

While any of them is not ready (i.e. still having unfulfilleddependency) this job will not be chosen to run (seedrm_sched_entity_is_ready). In this scenario if an earlier jobfrom entity E1 is not ready to run it will be skipped and a later jobfrom entity E2 (which is ready) will be chosen to run so E1 job is notblocking E2 job. The moment E1 jobdoes become ready it seems to me logical to let it run ASAP as it's bynow it spent the most time of anyone waiting for execution, and I don'tthink it matters that part of this time

was because it waited for dependency job to complete it's run.

Andrey

Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq

Reply via email to