From: David Rosca <[email protected]>

The DRM scheduler tracks who last uses an entity and when that process
is killed blocks all further submissions to that entity.

The problem is that we didn't track who initially created an entity, so
when a process accidently leaked its file descriptor to a child and
that child got killed, we killed the parent's entities.

Avoid that and instead initialize the entities last user on entity
creation. This also allows to drop the extra NULL check.

v2: still use cmpxchg
v3: improve the commit message

Signed-off-by: David Rosca <[email protected]>
Signed-off-by: Christian König <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4568
Reviewed-by: Alex Deucher <[email protected]>
CC: [email protected]
---
 drivers/gpu/drm/scheduler/sched_entity.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 5a4697f636f2..3e2f83dc3f24 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
        entity->guilty = guilty;
        entity->num_sched_list = num_sched_list;
        entity->priority = priority;
+       entity->last_user = current->group_leader;
        /*
         * It's perfectly valid to initialize an entity without having a valid
         * scheduler attached. It's just not valid to use the scheduler before 
it
@@ -302,7 +303,7 @@ long drm_sched_entity_flush(struct drm_sched_entity 
*entity, long timeout)
 
        /* For a killed process disallow further enqueueing of jobs. */
        last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
-       if ((!last_user || last_user == current->group_leader) &&
+       if (last_user == current->group_leader &&
            (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
                drm_sched_entity_kill(entity);
 
-- 
2.43.0

Reply via email to