On 09/26/2018 01:51 PM, Peter Zijlstra wrote: > On Wed, Sep 26, 2018 at 12:20:09PM -0400, Waiman Long wrote: >> On 09/26/2018 07:01 AM, Peter Zijlstra wrote: >>> Back when Will did his qspinlock determinism patches, we were left with one >>> cmpxchg loop on x86 due to the use of atomic_fetch_or(). Will proposed a >>> nifty >>> trick: >>> >>> http://lkml.kernel.org/r/20180409145409.ga9...@arm.com >>> >>> But at the time we didn't pursue it. This series implements that and argues >>> for >>> its correctness. In particular it places an smp_mb__after_atomic() in >>> between the two operations, which forces the load to come after the >>> store (which is free on x86 anyway). >>> >>> In particular this ordering ensures a concurrent unlock cannot trigger >>> the uncontended handoff. Also it ensures that if the xchg() happens >>> after a (successful) trylock, we must observe that LOCKED bit. >> When you said "concurrent unlock cannot trigger the uncontended >> handoff", are you saying the current code has this uncontended handoff >> problem or just when comparing to doing a load first followed by xchg()? > In particular I'm talking about this: > > locked: > /* > * claim the lock: > * > * n,0,0 -> 0,0,1 : lock, uncontended > * *,0,0 -> *,0,1 : lock, contended > * > * If the queue head is the only one in the queue (lock value == tail) > * and nobody is pending, clear the tail code and grab the lock. > * Otherwise, we only need to grab the lock. > */ > /* > * In the PV case we might already have _Q_LOCKED_VAL set, because > * of lock stealing; therefore we must also allow: > * > * n,0,1 -> 0,0,1 > * > * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the > * above wait condition, therefore any concurrent setting of > * PENDING will make the uncontended transition fail. > */ > if ((val & _Q_TAIL_MASK) == tail) { > if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL)) > goto release; /* No contention */ > } > > > Here queue head acquires the lock and we take the "uncontended" path. If > we set PENDING prior to this, the cmpxchg above will fail, and the later > load will observe the tail.
Thanks for the clarification. I was wondering if you are talking about a latent bug in the code. So this is not the case here. Cheers, Longman