On 15.10.25 10:23, Philipp Stanner wrote:
> On Tue, 2025-10-14 at 16:24 +0200, Christian König wrote:
>> From: David Rosca <[email protected]>
>>
>> The DRM scheduler tracks who last uses an entity and when that process
>> is killed blocks all further submissions to that entity.
>>
>> The problem is that we didn't tracked who initialy created an entity, so
>
> s/tracked/track
>
>> when an process accidentially leaked its file descriptor to a child and
>
> s/an/a
>
>> that child got killed we killed the parents entities.
>
> s/parents/parent's
>
>>
>> Avoid that and instead initialize the entities last user on entity
>> creation.
>>
>> Signed-off-by: David Rosca <[email protected]>
>> Signed-off-by: Christian König <[email protected]>
>> CC: [email protected]
>> ---
>> drivers/gpu/drm/scheduler/sched_entity.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>> b/drivers/gpu/drm/scheduler/sched_entity.c
>> index 5a4697f636f2..3e2f83dc3f24 100644
>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>> @@ -70,6 +70,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>> entity->guilty = guilty;
>> entity->num_sched_list = num_sched_list;
>> entity->priority = priority;
>> + entity->last_user = current->group_leader;
>> /*
>> * It's perfectly valid to initialize an entity without having a valid
>> * scheduler attached. It's just not valid to use the scheduler before
>> it
>> @@ -302,7 +303,7 @@ long drm_sched_entity_flush(struct drm_sched_entity
>> *entity, long timeout)
>>
>> /* For a killed process disallow further enqueueing of jobs. */
>> last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
>> - if ((!last_user || last_user == current->group_leader) &&
>> + if (last_user == current->group_leader &&
>
> It's not super clear from the commit message why the NULL check is
> being removed. Previously entities could have been killed if there was
> no last user. That's not desired anymore. Why?
The reason that we don't need the NULL check any more is because we now don't
encounter the NULL pointer any more.
In other words the pointer is now always initialized, even when the entity was
never used.
I've added another sentence to the commit message, but I'm not sure how to
better describe that. Is that sufficient? If not suggestions welcome.
Thanks,
Christian.
>
>
> P.
>
>> (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
>> drm_sched_entity_kill(entity);
>>
>