On Fri, 30 Jan 2026 20:56:31 +0100 Thomas Hellström 
<[email protected]> wrote:

> > 
> > > --- a/mm/hmm.c
> > > +++ b/mm/hmm.c
> > > @@ -674,6 +674,13 @@ int hmm_range_fault(struct hmm_range *range)
> > >                   return -EBUSY;
> > >           ret = walk_page_range(mm, hmm_vma_walk.last,
> > > range->end,
> > >                                 &hmm_walk_ops,
> > > &hmm_vma_walk);
> > > +         /*
> > > +          * Conditionally reschedule to let other work
> > > items get
> > > +          * a chance to unlock device-private pages whose
> > > locks
> > > +          * we're spinning on.
> > > +          */
> > > +         cond_resched();
> > > +
> > >           /*
> > >            * When -EBUSY is returned the loop restarts with
> > >            * hmm_vma_walk.last set to an address that has
> > > not been stored
> > 
> > If the process which is running hmm_range_fault() has
> > SCHED_FIFO/SHCED_RR then cond_resched() doesn't work.  An explicit
> > msleep() would be better?
> 
> Unfortunately hmm_range_fault() is typically called from a gpu
> pagefault handler and it's crucial to get the gpu up and running again
> as fast as possible.

Would a millisecond matter?  Regular old preemption will often cause
longer delays.

> Is there a way we could test for the cases where cond_resched() doesn't
> work and in that case instead call sched_yield(), at least on -EBUSY
> errors?

kernel-internal sched_yield() was taken away years ago and I don't
think there's a replacement, particularly one which will cause a
realtime-policy task to yield to a non-rt-policy one.

It's common for kernel code to forget that it could have realtime
policy - we probably have potential lockups in various places.

I suggest you rerun your testcase with this patch using `chrt -r', see
if my speculation is correct.

Reply via email to