On 01/18/2019 03:06 PM, Peter Zijlstra wrote: > On Fri, Jan 18, 2019 at 09:50:12AM -0500, Waiman Long wrote: >> On 01/18/2019 05:02 AM, Peter Zijlstra wrote: >>>> e.g. We can't take an SError during the SError handler. >>>> >>>> But we can take this SError/NMI on another CPU while the first one is still >>>> running the handler. >>>> >>>> These multiple NMIlike notifications mean having multiple >>>> locks/fixmap-slots, >>>> one per notification. This is where the qspinlock node limit comes in, as >>>> we >>>> could have more than 4 contexts. >>> Right; so Waiman was going to do a patch that reverts to test-and-set or >>> something along those lines once we hit the queue limit, which seems >>> like a good way out. Actually hitting that nesting level should be >>> exceedingly rare. >> Yes, I am working on a patch to support arbitrary levels of nesting. It >> is easy for PV qspinlock as lock stealing is supported. >> >> For native qspinlock, we cannot do lock stealing without incurring a >> certain amount of overhead in the regular slowpath code. It was up to >> 10% in my own testing. So I am exploring an alternative that can do the >> job without incurring any noticeable performance degradation in the >> slowpath. I ran into a race condition which I am still trying to find >> out where that comes from. Hopefully, I will have something to post next >> week. > Where does the overhead come from? Surely that's not just checking that > bound?
It is not about checking bound, it is about how to acquire the lock without using an MCS node. The overhead comes from using atomic instruction to acquire the lock instead of non-atomic one in order to allow lock stealing. Cheers, Longman

