On Tue, 2013-12-10 at 17:57 +0100, Peter Zijlstra wrote: > On Tue, Dec 03, 2013 at 01:45:27AM -0800, Davidlohr Bueso wrote: > > --- a/kernel/futex.c > > +++ b/kernel/futex.c > > @@ -82,10 +82,12 @@ > > * The waker side modifies the user space value of the futex and calls > > * futex_wake(). It computes the hash bucket and acquires the hash > > * bucket lock. Then it looks for waiters on that futex in the hash > > - * bucket and wakes them. > > - * > > - * Note that the spin_lock serializes waiters and wakers, so that the > > - * following scenario is avoided: > > + * bucket and wakes them. > > Why not let this be the start of a new paragraph? > > > In scenarios where wakeups are called and no > > + * tasks are blocked on a futex, taking the hb spinlock can be avoided > > + * and simply return. In order for this optimization to work, ordering > > + * guarantees must exist so that the waiter being added to the list is > > + * acknowledged when the list is concurrently being checked by the waker, > > + * avoiding scenarios like the following: > > * > > * CPU 0 CPU 1 > > * val = *futex; > > @@ -106,24 +108,40 @@ > > * This would cause the waiter on CPU 0 to wait forever because it > > * missed the transition of the user space value from val to newval > > * and the waker did not find the waiter in the hash bucket queue. > > + * The correct serialization ensures that a waiter either observes > > + * the changed user space value before blocking or is woken by a > > + * concurrent waker: > > * > > * CPU 0 CPU 1 > > * val = *futex; > > * sys_futex(WAIT, futex, val); > > * futex_wait(futex, val); > > + * > > + * mb(); <-- paired with ------ > > + * | > > + * lock(hash_bucket(futex)); | > > + * | > > + * uval = *futex; | > > + * | *futex = newval; > > + * | sys_futex(WAKE, futex); > > + * | futex_wake(futex); > > + * | > > + * --------> mb(); > > * if (uval == val) > > + * queue(); > > * unlock(hash_bucket(futex)); > > + * schedule(); if (!queue_empty()) > > + * lock(hash_bucket(futex)); > > + * wake_waiters(futex); > > + * unlock(hash_bucket(futex)); > > + * > > + * The length of the list is tracked with atomic ops (hb->waiters), > > + * providing the necessary memory barriers for the waiters. For the > > + * waker side, however, we rely on get_futex_key_refs(), using either > > + * ihold() or the atomic_inc(), for shared futexes. The former provides > > + * a full mb on all architectures. For architectures that do not have an > > + * implicit barrier in atomic_inc/dec, we explicitly add it - please > > + * refer to futex_get_mm() and hb_waiters_inc/dec(). > > */ > > This comment actually confuses me :/
Well that explains where the required barriers are coming from. > > It isn't at all explained what purpose the memory barriers serve. Why doesn't this explain it? "The correct serialization ensures that a waiter either observes the changed user space value before blocking or is woken by a concurrent waker." Perhaps adding an example? plist_add() | uaddr = newval smp_mb() | smp_mb() verify uaddr | plist_head_empty() Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/