On Fri, 20 May 2016, Peter Zijlstra wrote:
The problem is that the clear_pending_set_locked() is an unordered store, therefore this store can be delayed until no later than spin_unlock() (which orders against it due to the address dependency).This opens numerous races; for example: ipc_lock_object(&sma->sem_perm); sem_wait_array(sma); false -> spin_is_locked(&sma->sem_perm.lock) is entirely possible, because sem_wait_array() consists of pure reads, so the store can pass all that, even on x86.
I had pondered at the unordered stores in clear_pending_set_locked() for arm, for example, but I _certainly_ missed this for x86 inside the ACQUIRE region. Thanks, Davidlohr

