Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: - [Critical] Removing the `entity_idle` completion introduces a critical race condition between the scheduler worker thread and the entity teardown path, leading to Use-After-Free (UAF) vulnerabilities and SPSC queue corruption. --
commit f72f2810652760301a56b6a9cbcac8c26715a4a4 Author: Philipp Stanner <[email protected]> drm/sched: Remove entity->entity_idle The completion entity->entity_idle only existed because the entity was not properly locked through it's spinlock. The completion served to inform waiters about whether the entity is actually idle, which is something locking (previously added to drm_sched_entity_is_idle()) can fully achieve. > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c > b/drivers/gpu/drm/scheduler/sched_entity.c > index cb03d6a36578b..23536dcfa96a9 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c [ ... ] > @@ -285,12 +281,7 @@ void drm_sched_entity_kill(struct drm_sched_entity > *entity) > spin_lock(&entity->lock); > entity->stopped = true; > drm_sched_rq_remove_entity(entity->rq, entity); > - spin_unlock(&entity->lock); > > - /* Make sure this entity is not used by the scheduler at the moment */ > - wait_for_completion(&entity->entity_idle); > - > - spin_lock(&entity->lock); [Severity: High] Does removing this completion introduce a regression involving a race condition between the scheduler worker thread and the entity teardown path? If a userspace application submits gpu jobs and is simultaneously killed with a fatal signal, wait_event_killable() in drm_sched_entity_flush() can return -ERESTARTSYS immediately, bypassing the idle wait. Then drm_sched_entity_kill() executes and returns without waiting for the worker thread, allowing the entity to be freed while the scheduler worker thread is concurrently extracting a job. This could lead to concurrent spsc_queue_pop() calls in drm_sched_entity_pop_job() violating spsc queue invariants. It could also lead to a use-after-free when the worker thread accesses the freed entity pointer: drivers/gpu/drm/scheduler/sched_main.c:drm_sched_run_job_work() { ... sched_job = drm_sched_entity_pop_job(entity); if (!sched_job) { ... trace_drm_sched_job_run(sched_job, entity); ... } > prev = entity->last_scheduled; > dma_fence_get(prev); > spin_unlock(&entity->lock); -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=5
