sched: Warn if pending list is not empty

Philipp Stanner Tue, 22 Apr 2025 05:21:30 -0700

On Tue, 2025-04-22 at 13:07 +0100, Tvrtko Ursulin wrote:
> 
> On 22/04/2025 12:13, Danilo Krummrich wrote:
> > On Tue, Apr 22, 2025 at 11:39:11AM +0100, Tvrtko Ursulin wrote:
> > > Question I raised is if there are other drivers which manage to
> > > clean up
> > > everything correctly (like the mock scheduler does), but trigger
> > > that
> > > warning. Maybe there are not and maybe mock scheduler is the only
> > > false
> > > positive.
> > 
> > So far the scheduler simply does not give any guideline on how to
> > address the
> > problem, hence every driver simply does something (or nothing,
> > effectively
> > ignoring the problem). This is what we want to fix.
> > 
> > The mock scheduler keeps it's own list of pending jobs and on tear
> > down stops
> > the scheduler's workqueues, traverses it's own list and eventually
> > frees the
> > pending jobs without updating the scheduler's internal pending
> > list.
> > 
> > So yes, it does avoid memory leaks, but it also leaves the
> > schedulers internal
> > structures with an invalid state, i.e. the pending list of the
> > scheduler has
> > pointers to already freed memory.
> > 
> > What if the drm_sched_fini() starts touching the pending list? Then
> > you'd end up
> > with UAF bugs with this implementation. We cannot invalidate the
> > schedulers
> > internal structures and yet call scheduler functions - e.g.
> > drm_sched_fini() -
> > subsequently.
> > 
> > Hence, the current implementation of the mock scheduler is
> > fundamentally flawed.
> > And so would be *every* driver that still has entries within the
> > scheduler's
> > pending list.
> > 
> > This is not a false positive, it already caught a real bug -- in
> > the mock
> > scheduler.
> 
> To avoid furher splitting hairs on whether real bugs need to be able
> to 
> manifest or not, lets move past this with a conclusion that there are
> two potential things to do here:
> 
> First one is to either send separately or include in this series 
> something like:
> 
> diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c 
> b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> index f999c8859cf7..7c4df0e890ac 100644
> --- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
> @@ -300,6 +300,8 @@ void drm_mock_sched_fini(struct
> drm_mock_scheduler 
> *sched)
>                  drm_mock_sched_job_complete(job);
>          spin_unlock_irqrestore(&sched->lock, flags);
> 
> +       drm_sched_fini(&sched->base);
> +
>          /*
>           * Free completed jobs and jobs not yet processed by the DRM
> scheduler
>           * free worker.
> @@ -311,8 +313,6 @@ void drm_mock_sched_fini(struct
> drm_mock_scheduler 
> *sched)
> 
>          list_for_each_entry_safe(job, next, &list, link)
>                  mock_sched_free_job(&job->base);
> -
> -       drm_sched_fini(&sched->base);
>   }
> 
>   /**
> 
> That should satisfy the requirement to "clear" memory about to be
> freed 
> and be 100% compliant with drm_sched_fini() kerneldoc (guideline b).
> 
> But the new warning from 3/5 here will still be there AFAICT and
> would 
> you then agree it is a false positive?
> 
> Secondly, the series should modify all drivers (including the unit 
> tests) which are known to trigger this false positive.


There is no false positive. I just stated that in my other answer.

That said, I agree that the unit tests should be a second user of this
code in addition to Nouveau. They then should also provide the
fence_context_kill() callback, though.

And it seems we also agree that the memory leaks must be solved
centrally in drm_sched_fini().


P.

> 
> Regards,
> 
> Tvrtko
>

Re: [PATCH 3/5] drm/sched: Warn if pending list is not empty

Reply via email to