On Tue, Oct 14, 2025 at 11:41 AM Christian König
<[email protected]> wrote:
>
> From: David Rosca <[email protected]>
>
> The DRM scheduler tracks who last uses an entity and when that process
> is killed blocks all further submissions to that entity.
>
> The problem is that we didn't tracked who initialy created an entity, so

initially

> when an process accidentially leaked its file descriptor to a child and

accidently

> that child got killed we killed the parents entities.

that child got killed, we killed the parent's entities.

>
> Avoid that and instead initialize the entities last user on entity
> creation.
>
> Signed-off-by: David Rosca <[email protected]>
> Signed-off-by: Christian König <[email protected]>
> CC: [email protected]

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4568

With the above fixes,
Reviewed-by: Alex Deucher <[email protected]>

> ---
>  drivers/gpu/drm/scheduler/sched_entity.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 5a4697f636f2..3e2f83dc3f24 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>         entity->guilty = guilty;
>         entity->num_sched_list = num_sched_list;
>         entity->priority = priority;
> +       entity->last_user = current->group_leader;
>         /*
>          * It's perfectly valid to initialize an entity without having a valid
>          * scheduler attached. It's just not valid to use the scheduler 
> before it
> @@ -302,7 +303,7 @@ long drm_sched_entity_flush(struct drm_sched_entity 
> *entity, long timeout)
>
>         /* For a killed process disallow further enqueueing of jobs. */
>         last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
> -       if ((!last_user || last_user == current->group_leader) &&
> +       if (last_user == current->group_leader &&
>             (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
>                 drm_sched_entity_kill(entity);
>
> --
> 2.43.0
>

Reply via email to