starvation problem

Matthew Brost Fri, 30 Jan 2026 16:59:33 -0800

On Fri, Jan 30, 2026 at 01:08:35PM -0800, Andrew Morton wrote:
> On Fri, 30 Jan 2026 13:01:24 -0800 Matthew Brost <[email protected]> 
> wrote:
> 
> > > > Unfortunately hmm_range_fault() is typically called from a gpu
> > > > pagefault handler and it's crucial to get the gpu up and running again
> > > > as fast as possible.
> > > 
> > > Would a millisecond matter?  Regular old preemption will often cause
> > > longer delays.
> > > 
> > 
> > I think millisecond is too high. We are aiming to GPU page faults
> > serviced in 10-15us of CPU time (GPU copy time varies based on size of
> > fault / copy bus speed but still at most 200us).
> 
> But it's a rare case?
>


Not that rare. I believe this code path — where hmm_range_fault returns
-EBUSY — can be triggered any time HMM_PFN_REQ_FAULT is set and a page
needs to be faulted in.

We don't set HMM_PFN_REQ_FAULT in our GPU fault handler unless
migrations are racing, which should indeed be rare with well behaved
user space. But there are other cases, such as userptr binds, that do
set HMM_PFN_REQ_FAULT, where it's somewhat expected to fault in a bunch
of CPU pages. Doing an msleep probably isn’t a great idea in core code
that a bunch of drivers call, unless this is truly the last resort.
 
> Am I incorrect in believing that getting preempted will cause latencies
> much larger than this?
> 

I'm not really sure — I'm not a scheduling expert — but from my research
I think preemption is still less than 1ms, and in cases where you don't
preempt, cond_resched() is basically free.

> > Matt
> > 
> > > > Is there a way we could test for the cases where cond_resched() doesn't
> > > > work and in that case instead call sched_yield(), at least on -EBUSY
> > > > errors?
> > > 
> > > kernel-internal sched_yield() was taken away years ago and I don't
> > > think there's a replacement, particularly one which will cause a
> > > realtime-policy task to yield to a non-rt-policy one.
> > > 
> > > It's common for kernel code to forget that it could have realtime
> > > policy - we probably have potential lockups in various places.
> > > 
> > > I suggest you rerun your testcase with this patch using `chrt -r', see
> > > if my speculation is correct.
> 
> Please?

Thomas is in Europe, so he’s already done for the day. But I tested this
fix and verified that it resolves the hang we were seeing.

I also did at least 10 runs with chrt -r 1, chrt -r 50, and chrt -r 99.
I couldn’t get it to hang — previously I could reproduce the hang in at
most 2 runs.

In this test case all threads are on work queues, which I believe can
bypass real-time scheduling policies, so that likely explains the chrt
result. I think we’d have to craft a test case that triggers an
hmm_range_fault from a user space call (userptr binds would do this) and
race it with a migration to catch any RT bugs.

Matt

Re: [PATCH] mm/hmm: Fix a hmm_range_fault() livelock / starvation problem

Reply via email to