On Tue, 6 Jan 2026 11:17:19 -0800
"Paul E. McKenney" <[email protected]> wrote:

> > Interesting. I might look into the boosting logic to see whether we can 
> > avoid
> > boosting certain tasks depending on whether they help the grace period 
> > complete
> > or not. Thank you for the suggestion.  
> 
> Just so you know, all of my simplification efforts thus far have instead
> made it more complex, but who knows what I might have been missing?

Maybe you are too smart to make it simple? ;-)


> I could easily believe that the vCPU preemption problem needs to be
> addressed, but doing so on a per-spinlock basis would lead to greatly
> increased complexity throughout the kernel, not just RCU.

Agreed.

> 
> > > The main point of this patch series is to avoid lock contention due to
> > > vCPU preemption, correct?  If so, will we need similar work on the other
> > > locks in the Linux kernel, both within RCU and elsewhere?  I vaguely
> > > recall your doing some work along those lines a few years back, and
> > > maybe Thomas Gleixner's deferred-preemption work could help with this.
> > > Or not, who knows?  Keeping the hypervisor informed of lock state is
> > > not necessarily free.  
> > 
> > Yes, I did some work on this at Google, but it turned out to be a very
> > fragmented effort in terms of where (which subsystem - KVM, scheduler etc)
> > should we do the priority boosting of vCPU threads. In the end, we just 
> > ended up
> > with an internal prototype that was not upstreamable but worked pretty well 
> > and
> > only had time for production (a lesson I learned there is we should probably
> > work on upstream solutions first, but life is not that easy sometimes).  
> 
> Which is one reason deferred preemption would be attractive.

Yes. That's why I've been pushing it.

> 
> > About the deferred-preemption, I believe Steven Rostedt at one point was 
> > looking
> > at that for VMs, but that effort stalled as Peter is concerned about doing 
> > that
> > would mess up the scheduler. The idea (AFAIU) is to use the rseq page to
> > communicate locking information between vCPU threads and the host and then 
> > let
> > the host avoid vCPU preemption - but the scheduler needs to do something 
> > with
> > that information. Otherwise, it's no use.  
> 
> Has deferred preemption for userspace locking also stalled?  If not,
> then the scheduler's support for userspace should apply directly to
> guest OSes, right?

No, the user space deferred preemption is still moving along nicely (I
believe Thomas has completed most of it). The issue here is that the
deferred happens before going back to user space. That's a different
location than going back to the guest. The logic needs to be in that path
too.

One thing that Peter Zijlstra pushed was the limited amount of time that
deferred wait may happen. He says user space spinlocks are a bad design,
but it has been proven for that they are currently the most efficient when
coming to very short critical sections. That is, where the critical section
is shorter than the cost of a system call. Thus, he forces the deferred
scheduling to be at most 50us max (he's also suggested less than that).

But when it comes to the guest, where kernel spinlocks are user space
spinlocks, and can be held for more than 50us, I would like a way to have
the guests defer the scheduling for even longer than user space spin locks.

-- Steve


Reply via email to