On 07/14, Peter Zijlstra wrote: > > Currently the percpu-rwsem switches to (global) atomic ops while a > writer is waiting; which could be quite a while and slows down > releasing the readers. > > This patch cures this problem by ordering the reader-state vs > reader-count (see the comments in __percpu_down_read() and > percpu_down_write()). This changes a global atomic op into a full > memory barrier, which doesn't have the global cacheline contention.
I've applied this patch + another change you sent on top of it. Everything looks good to me except the __this_cpu_inc() in __percpu_down_read(), > + __down_read(&sem->rw_sem); > + __this_cpu_inc(*sem->read_count); > + __up_read(&sem->rw_sem); Preemption is already enabled, don't we need this_cpu_inc() ? > -EXPORT_SYMBOL_GPL(percpu_up_write); > +EXPORT_SYMBOL(percpu_up_write); and this one ;) I do not really care, but it seems you did this change by accident. Actually, I _think_ we can do some cleanups/improvements on top of this change, but we can do this later. In particular, _perhaps_ we can avoid the unconditional wakeup in __percpu_up_read(), but I am not sure and in any case this needs another change. Reviewed-by: Oleg Nesterov <[email protected]>

