rq NULL check

Andrey Grodzovsky Tue, 14 Aug 2018 08:17:49 -0700

I assume that this is the only code change and no locks are taken indrm_sched_entity_push_job -

What happens if process A runs drm_sched_entity_push_job after this codewas executed from the (dying) process B and there

are still jobs in the queue (the wait_event terminated prematurely), theentity already removed from rq , but bool 'first' indrm_sched_entity_push_job

will return false and so the entity will not be reinserted back into rqentity list and no wake up trigger will happen for process A pushing anew job.



Another issue bellow -

Andrey


On 08/14/2018 03:05 AM, Christian König wrote:

I would rather like to avoid taking the lock in the hot path.

How about this:

     /* For killed process disable any more IBs enqueue right now */
    last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
     if ((!last_user || last_user == current->group_leader) &&

(current->flags & PF_EXITING) && (current->exit_code ==SIGKILL)) {

        grab_lock();
         drm_sched_rq_remove_entity(entity->rq, entity);
        if (READ_ONCE(&entity->last_user) != NULL)

This condition is true because just exactly now process A diddrm_sched_entity_push_job->WRITE_ONCE(entity->last_user,current->group_leader);and so the line bellow executed and entity reinserted into rq. Let's sayalso that the entity job queue is empty now. For process A bool 'first'will be trueand hence alsodrm_sched_entity_push_job->drm_sched_rq_add_entity(entity->rq, entity)will take place causing double insertion of the entity queue into rq list.


Andrey

drm_sched_rq_add_entity(entity->rq, entity);
        drop_lock();
    }

Christian.

Am 13.08.2018 um 18:43 schrieb Andrey Grodzovsky:
Attached.
If the general idea in the patch is OK I can think of a test (andmaybe add to libdrm amdgpu tests) to actually simulate this scenariowith 2 forked
concurrent processes working on same entity's job queue when one isdying while the other keeps pushing to the same queue. For now I onlytested it
with normal boot and ruining multiple glxgears concurrently - whichdoesn't really test this code path since i think each of them workson it's own FD.
Andrey


On 08/10/2018 09:27 AM, Christian König wrote:
Crap, yeah indeed that needs to be protected by some lock.

Going to prepare a patch for that,
Christian.

Am 09.08.2018 um 21:49 schrieb Andrey Grodzovsky:
Reviewed-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
But I still have questions about entity->last_user (didn't noticethis before) -
Looks to me there is a race condition with it's current usage,let's say process A was preempted after doingdrm_sched_entity_flush->cmpxchg(...)
now process B working on same entity (forked) is insidedrm_sched_entity_push_job, he writes his PID to entity->last_userand also
executes drm_sched_rq_add_entity. Now process A runs again andexecute drm_sched_rq_remove_entity inadvertently causing process Bremoval
from it's scheduler rq.
Looks to me like instead we should lock together entity->last_useraccesses and adds/removals of entity to the rq.
Andrey


On 08/06/2018 10:18 AM, Nayan Deshmukh wrote:
I forgot about this since we started discussing possible scenariosof processes and threads.
In any case, this check is redundant. Acked-by: Nayan Deshmukh<nayan26deshm...@gmail.com <mailto:nayan26deshm...@gmail.com>>
Nayan
On Mon, Aug 6, 2018 at 7:43 PM Christian König<ckoenig.leichtzumer...@gmail.com<mailto:ckoenig.leichtzumer...@gmail.com>> wrote:
    Ping. Any objections to that?

    Christian.

    Am 03.08.2018 um 13:08 schrieb Christian König:
    > That is superflous now.
    >
    > Signed-off-by: Christian König <christian.koe...@amd.com
    <mailto:christian.koe...@amd.com>>
    > ---
    >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 5 -----
    >   1 file changed, 5 deletions(-)
    >
    > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
    b/drivers/gpu/drm/scheduler/gpu_scheduler.c
    > index 85908c7f913e..65078dd3c82c 100644
    > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
    > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
    > @@ -590,11 +590,6 @@ void drm_sched_entity_push_job(struct
    drm_sched_job *sched_job,
    >       if (first) {
    >               /* Add the entity to the run queue */
    >               spin_lock(&entity->rq_lock);
    > -             if (!entity->rq) {
    > -                     DRM_ERROR("Trying to push to a killed
    entity\n");
    > -  spin_unlock(&entity->rq_lock);
    > -                     return;
    > -             }
    >  drm_sched_rq_add_entity(entity->rq, entity);
    >  spin_unlock(&entity->rq_lock);
    >  drm_sched_wakeup(entity->rq->sched);
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/scheduler: Remove entity->rq NULL check

Reply via email to