Peter, On Tue, Aug 09, 2016 at 11:51:12AM +0200, Peter Zijlstra wrote: > Currently the percpu-rwsem switches to (global) atomic ops while a > writer is waiting; which could be quite a while and slows down > releasing the readers. > > This patch cures this problem by ordering the reader-state vs > reader-count (see the comments in __percpu_down_read() and > percpu_down_write()). This changes a global atomic op into a full > memory barrier, which doesn't have the global cacheline contention. > > This also enables using the percpu-rwsem with rcu_sync disabled in order > to bias the implementation differently, reducing the writer latency by > adding some cost to readers. > > Cc: Paul McKenney <paul...@linux.vnet.ibm.com> > Reviewed-by: Oleg Nesterov <o...@redhat.com> > Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> > --- > include/linux/percpu-rwsem.h | 84 +++++++++++++-- > kernel/locking/percpu-rwsem.c | 228 > ++++++++++++++++++++++++------------------ > 2 files changed, 206 insertions(+), 106 deletions(-) > > --- a/include/linux/percpu-rwsem.h > +++ b/include/linux/percpu-rwsem.h > @@ -10,30 +10,96 @@ > > struct percpu_rw_semaphore { > struct rcu_sync rss; > - unsigned int __percpu *fast_read_ctr; > + unsigned int __percpu *read_count; > struct rw_semaphore rw_sem; > - atomic_t slow_read_ctr; > - wait_queue_head_t write_waitq; > + wait_queue_head_t writer; > + int readers_block; > }; > > -extern void percpu_down_read(struct percpu_rw_semaphore *); > -extern int percpu_down_read_trylock(struct percpu_rw_semaphore *); > -extern void percpu_up_read(struct percpu_rw_semaphore *); > +extern int __percpu_down_read(struct percpu_rw_semaphore *, int); > +extern void __percpu_up_read(struct percpu_rw_semaphore *); > + > +static inline void percpu_down_read(struct percpu_rw_semaphore *sem) > +{ > + might_sleep(); > + > + rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 0, _RET_IP_); > + > + preempt_disable(); > + /* > + * We are in an RCU-sched read-side critical section, so the writer > + * cannot both change sem->state from readers_fast and start checking > + * counters while we are here. So if we see !sem->state, we know that > + * the writer won't be checking until we're past the preempt_enable() > + * and that one the synchronize_sched() is done, the writer will see > + * anything we did within this RCU-sched read-size critical section. > + */ > + __this_cpu_inc(*sem->read_count); > + if (unlikely(!rcu_sync_is_idle(&sem->rss)))
The call to rcu_sync_is_idle() causes the following build error when building x86_64:allmodconfig. ERROR: "rcu_sync_lockdep_assert" [kernel/locking/locktorture.ko] undefined! ERROR: "rcu_sync_lockdep_assert" [fs/ext4/ext4.ko] undefined! I think this was also reported by the 0-day build bot. The simple fix would of course be to export rcu_sync_lockdep_assert. Before I apply that change to the Android code (where the patch has been aplied and the problem is seen) - do you by any chance have a better solution in mind ? Thanks, Guenter