On Wed, 2025-11-12 at 14:31 +0100, Boris Brezillon wrote:
> Hi Philipp,
> 
> On Wed, 12 Nov 2025 13:38:00 +0100
> Philipp Stanner <[email protected]> wrote:
> 
> > On Wed, 2025-11-12 at 13:17 +0100, Boris Brezillon wrote:
> > > From: Ashley Smith <[email protected]>
> > > 
> > > The timeout logic provided by drm_sched leads to races when we try
> > > to suspend it while the drm_sched workqueue queues more jobs. Let's
> > > overhaul the timeout handling in panthor to have our own delayed work
> > > that's resumed/suspended when a group is resumed/suspended. When an
> > > actual timeout occurs, we call drm_sched_fault() to report it
> > > through drm_sched, still. But otherwise, the drm_sched timeout is
> > > disabled (set to MAX_SCHEDULE_TIMEOUT), which leaves us in control of
> > > how we protect modifications on the timer.
> > > 
> > > One issue seems to be when we call drm_sched_suspend_timeout() from
> > > both queue_run_job() and tick_work() which could lead to races due to
> > > drm_sched_suspend_timeout() not having a lock. Another issue seems to
> > > be in queue_run_job() if the group is not scheduled, we suspend the
> > > timeout again which undoes what drm_sched_job_begin() did when calling
> > > drm_sched_start_timeout(). So the timeout does not reset when a job
> > > is finished.
> > > 
> > >   
> > 
> > […]
> > 
> > > +
> > > +static void
> > > +queue_reset_timeout_locked(struct panthor_queue *queue)
> > > +{
> > > + lockdep_assert_held(&queue->fence_ctx.lock);
> > > +
> > > + if (queue->timeout.remaining != MAX_SCHEDULE_TIMEOUT) {
> > > +         mod_delayed_work(queue->scheduler.timeout_wq,  
> > 
> > Here you are interfering with the scheduler's internals again, don't
> > you. I think we agreed that we don't want to do such things anymore,
> > didn't we?
> 
> We're not really touching drm_gpu_scheduler's internals, we're just
> retrieving the timeout workqueue we passed at init time:
> panthor_queue::timeout is panthor internals not drm_sched internals.
> 
> This being said, I agree we should use ptdev->reset.wq instead of
> retrieving the timeout workqueue through the drm_gpu_scheduler object.
> 
> Just to be clear, the goal of this patch is to bypass the
> drm_gpu_scheduler timeout logic entirely, so we can have our own thing
> that's not racy (see below).

OK. timeout_wq sharing is intended and allowed, so if that's what
you're doing, good. But I agree that accessing the wq through the
driver's struct is then cleaner and more obviously correct.

> 
> > 
> > You could write a proper drm_sched API function which serves your
> > usecase.
> 
> It's not really lack of support for our usecase that drives this
> change, but more the fact the current helpers are racy for drivers that
> have a 1:1 entity:sched relationship with queues that can be scheduled
> out behind drm_gpu_scheduler's back.

And you also can't stop drm_sched to prevent races?

> 
> > 
> > Or could maybe DRM_GPU_SCHED_STAT_NO_HANG be returned from your driver
> > in case an interrupt actually fires?
> 
> I don't think it helps, see below.
> 
> > 
> > > +                          &queue->timeout.work,
> > > +                          msecs_to_jiffies(JOB_TIMEOUT_MS));
> > > + }
> > > +}
> > > +
> > > +static bool
> > > +group_can_run(struct panthor_group *group)
> > > +{
> > > + return group->state != PANTHOR_CS_GROUP_TERMINATED &&
> > > +        group->state != PANTHOR_CS_GROUP_UNKNOWN_STATE &&
> > > +        !group->destroyed && group->fatal_queues == 0 &&
> > > +        !group->timedout;
> > > +}
> > > +
> > >   
> > 
> > […]
> > 
> > > +queue_suspend_timeout(struct panthor_queue *queue)
> > > +{
> > > + spin_lock(&queue->fence_ctx.lock);
> > > + queue_suspend_timeout_locked(queue);
> > > + spin_unlock(&queue->fence_ctx.lock);
> > > +}
> > > +
> > > +static void
> > > +queue_resume_timeout(struct panthor_queue *queue)
> > > +{
> > > + spin_lock(&queue->fence_ctx.lock);
> > > +
> > > + if (queue_timeout_is_suspended(queue)) {
> > > +         mod_delayed_work(queue->scheduler.timeout_wq,  
> > 
> > There is drm_sched_resume_timeout(). Why can it not be used?
> 
> Because it's racy. I don't remember all the details, but IIRC, it had
> to do with job insertion calling drm_sched_start_timeout() while we're
> calling drm_sched_resume_timeout() from cs_slot_reset_locked(). We
> tried to make it work, but we gave up at some point and went for a
> driver-specific timer, because ultimately what we want is a per-queue
> timeout that we can pause/resume without drm_sched interfering when new
> jobs are queued to our ring buffers: we resume when the execution
> context (AKA group) is scheduled in, and we pause when this execution
> context is scheduled out.
> 
> That's the very reason we set drm_gpu_scheduler::timeout to
> MAX_SCHEDULE_TIMEOUT at init time (AKA, timeout disabled) and never
> touch that again. When our driver-internal timer expires, we forward
> the information to the drm_sched layer by calling drm_sched_fault().

That sounds all.. stressful ;)

As you know I only learned a few weeks ago about your group scheduler
on top of drm_sched. I wish I had heard about it when it was
implemented; we might have come up with the idea for drm_jobqueue
sooner.

> 
> > 
> > > +                          &queue->timeout.work,
> > > +                          queue->timeout.remaining);
> > > +
> > > +         queue->timeout.remaining = MAX_SCHEDULE_TIMEOUT;
> > > + }
> > > +
> > > + spin_unlock(&queue->fence_ctx.lock);
> > > +}
> > > +
> > >   
> > 
> > […]
> > 
> > >  
> > > @@ -3270,6 +3379,11 @@ queue_timedout_job(struct drm_sched_job *sched_job)
> > >  
> > >   queue_start(queue);
> > >  
> > > + /* We already flagged the queue as faulty, make sure we don't get
> > > +  * called again.
> > > +  */
> > > + queue->scheduler.timeout = MAX_SCHEDULE_TIMEOUT;
> > > +
> > >   return DRM_GPU_SCHED_STAT_RESET;  
> > 
> > DRM_GPU_SCHED_STAT_NO_HANG instead of just modifying the scheduler's
> > internal data??
> 
> 
>       queue->scheduler.timeout = MAX_SCHEDULE_TIMEOUT;
> 
> is a leftover from a previous version. We shouldn't have to modify that
> here because the timeout is initialized to MAX_SCHEDULE_TIMEOUT and
> never touched again.

So you agree that it can be removed in v8?


P.

Reply via email to