On 08/29/2016 03:44 PM, Peter Zijlstra wrote:
If you add a barrier, the Changelog had better be clear. And I'm still
not entirely sure I get what exactly this barrier should do, nor why it
defaults to a full smp_mb. If what I suspect it should do, only PPC and
ARM64 need the barrier.
The barrier must ensure that taking the spinlock (as observed by another
cpu with spin_unlock_wait()) and a following read are ordered.
start condition: sma->complex_mode = false;
CPU 1:
spin_lock(&sem->lock); /* sem_nsems instances */
smp_mb__after_spin_lock();
if (!smp_load_acquire(&sma->complex_mode)) {
/* fast path successful! */
return sops->sem_num;
}
/* slow path, not relevant */
CPU 2: (holding sma->sem_perm.lock)
smp_store_mb(sma->complex_mode, true);
for (i = 0; i < sma->sem_nsems; i++) {
spin_unlock_wait(&sma->sem_base[i].lock);
}
It must not happen that both CPUs proceed:
Either CPU1 proceeds, then CPU2 must spin in spin_unlock_wait()
or CPU2 proceeds, then CPU1 must enter the slow path.
What about this?
/*
* spin_lock() provides ACQUIRE semantics regarding reading the lock.
* There are no guarantees that the store of the lock is visible before
* any read or write operation within the protected area is performed.
* If the store of the lock must happen first, this function is required.
*/
#define spin_lock_store_acquire()
I would update the patch series.
--
Manfred