On Fri, 30 Jan 2026 20:56:31 +0100 Thomas Hellström <[email protected]> wrote:
> > > > > --- a/mm/hmm.c > > > +++ b/mm/hmm.c > > > @@ -674,6 +674,13 @@ int hmm_range_fault(struct hmm_range *range) > > > return -EBUSY; > > > ret = walk_page_range(mm, hmm_vma_walk.last, > > > range->end, > > > &hmm_walk_ops, > > > &hmm_vma_walk); > > > + /* > > > + * Conditionally reschedule to let other work > > > items get > > > + * a chance to unlock device-private pages whose > > > locks > > > + * we're spinning on. > > > + */ > > > + cond_resched(); > > > + > > > /* > > > * When -EBUSY is returned the loop restarts with > > > * hmm_vma_walk.last set to an address that has > > > not been stored > > > > If the process which is running hmm_range_fault() has > > SCHED_FIFO/SHCED_RR then cond_resched() doesn't work. An explicit > > msleep() would be better? > > Unfortunately hmm_range_fault() is typically called from a gpu > pagefault handler and it's crucial to get the gpu up and running again > as fast as possible. Would a millisecond matter? Regular old preemption will often cause longer delays. > Is there a way we could test for the cases where cond_resched() doesn't > work and in that case instead call sched_yield(), at least on -EBUSY > errors? kernel-internal sched_yield() was taken away years ago and I don't think there's a replacement, particularly one which will cause a realtime-policy task to yield to a non-rt-policy one. It's common for kernel code to forget that it could have realtime policy - we probably have potential lockups in various places. I suggest you rerun your testcase with this patch using `chrt -r', see if my speculation is correct.
