Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider: - [High] Use-After-Free in `drm_sched_entity_error()` due to lockless clearing of `entity->last_scheduled` in `drm_sched_entity_fini()`. - [Low] Incomplete preparational refactoring: `drm_sched_entity_kill()` calls `spsc_queue_pop()` locklessly. --
commit 8128ccd719adfa75d48bc49990edb4b623aa9607 Author: Philipp Stanner <[email protected]> drm/sched: Protect entity->last_scheduled with spinlock This commit replaces the RCU mechanisms and memory barriers protecting entity->last_scheduled with a spinlock to improve maintainability. It also begins guarding spsc_queue_pop() with the lock in preparation for replacing the queue with a locked list. > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c > b/drivers/gpu/drm/scheduler/sched_entity.c > index 4ebb513255edd..2959feb08ac7e 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -202,10 +201,10 @@ int drm_sched_entity_error(struct drm_sched_entity > *entity) > struct dma_fence *fence; > int r; > > - rcu_read_lock(); > - fence = rcu_dereference(entity->last_scheduled); > + spin_lock(&entity->lock); > + fence = entity->last_scheduled; > r = fence ? fence->error : 0; > - rcu_read_unlock(); > + spin_unlock(&entity->lock); > > return r; > } [ ... ] > @@ -292,8 +291,10 @@ void drm_sched_entity_kill(struct drm_sched_entity > *entity) > wait_for_completion(&entity->entity_idle); > > /* The entity is guaranteed to not be used by the scheduler */ > - prev = rcu_dereference_check(entity->last_scheduled, true); > + spin_lock(&entity->lock); > + prev = entity->last_scheduled; > dma_fence_get(prev); > + spin_unlock(&entity->lock); > while ((job = drm_sched_entity_queue_pop(entity))) { > struct drm_sched_fence *s_fence = job->s_fence; [Severity: Low] The commit message notes that spsc_queue_pop() should be guarded with the lock as a preparational step, but drm_sched_entity_queue_pop() (which wraps spsc_queue_pop()) is called here in the while loop without holding entity->lock. Could this lead to lockless list manipulation when the underlying queue data structure is later replaced with a locked list? [ ... ] > @@ -388,8 +389,12 @@ void drm_sched_entity_fini(struct drm_sched_entity > *entity) > entity->dependency = NULL; > } > > - dma_fence_put(rcu_dereference_check(entity->last_scheduled, true)); > - RCU_INIT_POINTER(entity->last_scheduled, NULL); > + dma_fence_put(entity->last_scheduled); > + /* > + * Normally all users should be gone now, but since drm_sched has > + * experienced many layering violations in the past, better be safe. > + */ > + WRITE_ONCE(entity->last_scheduled, NULL); > drm_sched_entity_stats_put(entity->stats); > } [Severity: High] Could this lockless pointer clearing lead to a use-after-free in drm_sched_entity_error()? Since drm_sched_entity_error() now uses spin_lock() instead of rcu_read_lock(), a concurrent reader could acquire entity->lock between the dma_fence_put() and WRITE_ONCE() calls here. If an RCU grace period completes before the reader dereferences fence->error, might the reader access a freed fence? Since spinlocks under PREEMPT_RT can sleep, RCU grace periods aren't implicitly blocked. Would it be safer to acquire entity->lock around the pointer clearing sequence here? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
