On 14/10/2025 15:24, Christian König wrote:
From: David Rosca <[email protected]>

The DRM scheduler tracks who last uses an entity and when that process
is killed blocks all further submissions to that entity.

The problem is that we didn't tracked who initialy created an entity, so
when an process accidentially leaked its file descriptor to a child and
that child got killed we killed the parents entities.

Avoid that and instead initialize the entities last user on entity
creation.

Signed-off-by: David Rosca <[email protected]>
Signed-off-by: Christian König <[email protected]>
CC: [email protected]

Fixes: 43bce41cf48e ("drm/scheduler: only kill entity if last user is killed v2")
Cc: <[email protected]> # v4.19+

Up to back there perhaps?

---
  drivers/gpu/drm/scheduler/sched_entity.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 5a4697f636f2..3e2f83dc3f24 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
        entity->guilty = guilty;
        entity->num_sched_list = num_sched_list;
        entity->priority = priority;
+       entity->last_user = current->group_leader;
        /*
         * It's perfectly valid to initialize an entity without having a valid
         * scheduler attached. It's just not valid to use the scheduler before 
it
@@ -302,7 +303,7 @@ long drm_sched_entity_flush(struct drm_sched_entity 
*entity, long timeout)
/* For a killed process disallow further enqueueing of jobs. */
        last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
-       if ((!last_user || last_user == current->group_leader) &&
+       if (last_user == current->group_leader &&
            (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
                drm_sched_entity_kill(entity);

Hm, but is it not just a band aid for a specific usage pattern?

Ie. the exiting process can still kill the shared entity, just needs to be the last one using it. So for an use case where two threads might be legitimately sharing it will be random chance whether one exiting thread kills the shared entity or not.

Not saying I know of such patterns, but I am also not sure at the scheduler level it should be precluded. Neither that defining sensible and compatible semantics is easy at this point.

So if what I write is correct perhaps just explain it in the commit message. Or even a code comment.

Regards,

Tvrtko

Reply via email to