On Fri, Jan 30, 2026 at 12:38:10PM -0800, Andrew Morton wrote: > On Fri, 30 Jan 2026 20:56:31 +0100 Thomas Hellström > <[email protected]> wrote: > > > > > > > > --- a/mm/hmm.c > > > > +++ b/mm/hmm.c > > > > @@ -674,6 +674,13 @@ int hmm_range_fault(struct hmm_range *range) > > > > return -EBUSY; > > > > ret = walk_page_range(mm, hmm_vma_walk.last, > > > > range->end, > > > > &hmm_walk_ops, > > > > &hmm_vma_walk); > > > > + /* > > > > + * Conditionally reschedule to let other work > > > > items get > > > > + * a chance to unlock device-private pages whose > > > > locks > > > > + * we're spinning on. > > > > + */ > > > > + cond_resched(); > > > > + > > > > /* > > > > * When -EBUSY is returned the loop restarts with > > > > * hmm_vma_walk.last set to an address that has > > > > not been stored > > > > > > If the process which is running hmm_range_fault() has > > > SCHED_FIFO/SHCED_RR then cond_resched() doesn't work. An explicit > > > msleep() would be better? > > > > Unfortunately hmm_range_fault() is typically called from a gpu > > pagefault handler and it's crucial to get the gpu up and running again > > as fast as possible. > > Would a millisecond matter? Regular old preemption will often cause > longer delays. >
I think millisecond is too high. We are aiming to GPU page faults serviced in 10-15us of CPU time (GPU copy time varies based on size of fault / copy bus speed but still at most 200us). Matt > > Is there a way we could test for the cases where cond_resched() doesn't > > work and in that case instead call sched_yield(), at least on -EBUSY > > errors? > > kernel-internal sched_yield() was taken away years ago and I don't > think there's a replacement, particularly one which will cause a > realtime-policy task to yield to a non-rt-policy one. > > It's common for kernel code to forget that it could have realtime > policy - we probably have potential lockups in various places. > > I suggest you rerun your testcase with this patch using `chrt -r', see > if my speculation is correct.
