Re: [PATCH v2 5/5] locking/rwsem: Remove reader optimistic spinning

2020-12-08 Thread Waiman Long

On 12/8/20 5:07 AM, Peter Zijlstra wrote:

On Fri, Nov 20, 2020 at 11:14:16PM -0500, Waiman Long wrote:



@@ -1032,40 +901,16 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int 
state, long count)
 *
 * We can take the read lock directly without doing
 * rwsem_optimistic_spin() if the conditions are right.

This comment no longer makes sense..


You are right. I forgot to take that out.



-* Also wake up other readers if it is the first reader.
 */
-   if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF)) &&
-   rwsem_no_spinners(sem)) {
+   if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF))) {
rwsem_set_reader_owned(sem);
lockevent_inc(rwsem_rlock_steal);
-   if (rcnt == 1)
-   goto wake_readers;
-   return sem;
-   }
  
-	/*

-* Save the current read-owner of rwsem, if available, and the
-* reader nonspinnable bit.
-*/
-   waiter.last_rowner = owner;
-   if (!(waiter.last_rowner & RWSEM_READER_OWNED))
-   waiter.last_rowner &= RWSEM_RD_NONSPINNABLE;
-
-   if (!rwsem_can_spin_on_owner(sem, RWSEM_RD_NONSPINNABLE))
-   goto queue;
-
-   /*
-* Undo read bias from down_read() and do optimistic spinning.
-*/
-   atomic_long_add(-RWSEM_READER_BIAS, >count);
-   adjustment = 0;
-   if (rwsem_optimistic_spin(sem, false)) {

since we're removing the optimistic spinning entirely on the read side.

Also, I was looking at skipping patch #4, which mucks with the reader
wakeup logic, and afaict this removal doesn't really depend on it.

Or am I missing something?


That is true. Patch 4 isn't essential for this series. So if you are 
general OK with the current patchset, I can send out v3 that remove 
patch 4 and make the your suggested change above.


Cheers,
Longman



Re: [PATCH v2 5/5] locking/rwsem: Remove reader optimistic spinning

2020-12-08 Thread Peter Zijlstra
On Fri, Nov 20, 2020 at 11:14:16PM -0500, Waiman Long wrote:


> @@ -1032,40 +901,16 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int 
> state, long count)
>*
>* We can take the read lock directly without doing
>* rwsem_optimistic_spin() if the conditions are right.

This comment no longer makes sense..

> -  * Also wake up other readers if it is the first reader.
>*/
> - if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF)) &&
> - rwsem_no_spinners(sem)) {
> + if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF))) {
>   rwsem_set_reader_owned(sem);
>   lockevent_inc(rwsem_rlock_steal);
> - if (rcnt == 1)
> - goto wake_readers;
> - return sem;
> - }
>  
> - /*
> -  * Save the current read-owner of rwsem, if available, and the
> -  * reader nonspinnable bit.
> -  */
> - waiter.last_rowner = owner;
> - if (!(waiter.last_rowner & RWSEM_READER_OWNED))
> - waiter.last_rowner &= RWSEM_RD_NONSPINNABLE;
> -
> - if (!rwsem_can_spin_on_owner(sem, RWSEM_RD_NONSPINNABLE))
> - goto queue;
> -
> - /*
> -  * Undo read bias from down_read() and do optimistic spinning.
> -  */
> - atomic_long_add(-RWSEM_READER_BIAS, >count);
> - adjustment = 0;
> - if (rwsem_optimistic_spin(sem, false)) {

since we're removing the optimistic spinning entirely on the read side.

Also, I was looking at skipping patch #4, which mucks with the reader
wakeup logic, and afaict this removal doesn't really depend on it.

Or am I missing something?




Re: [PATCH v2 5/5] locking/rwsem: Remove reader optimistic spinning

2020-12-07 Thread Davidlohr Bueso

On Fri, 20 Nov 2020, Waiman Long wrote:


Reader optimistic spinning is helpful when the reader critical section
is short and there aren't that many readers around. It also improves
the chance that a reader can get the lock as writer optimistic spinning
disproportionally favors writers much more than readers.

Since commit d3681e269fff ("locking/rwsem: Wake up almost all readers
in wait queue"), all the waiting readers are woken up so that they can
all get the read lock and run in parallel. When the number of contending
readers is large, allowing reader optimistic spinning will likely cause
reader fragmentation where multiple smaller groups of readers can get
the read lock in a sequential manner separated by writers. That reduces
reader parallelism.

One possible way to address that drawback is to limit the number of
readers (preferably one) that can do optimistic spinning. These readers
act as representatives of all the waiting readers in the wait queue as
they will wake up all those waiting readers once they get the lock.

Alternatively, as reader optimistic lock stealing has already enhanced
fairness to readers, it may be easier to just remove reader optimistic
spinning and simplifying the optimistic spinning code as a result.

Performance measurements (locking throughput kops/s) using a locking
microbenchmark with 50/50 reader/writer distribution and turbo-boost
disabled was done on a 2-socket Cascade Lake system (48-core 96-thread)
to see the impacts of these changes:

 1) Vanilla - 5.10-rc3 kernel
 2) Before  - 5.10-rc3 kernel with previous patches in this series
 2) limit-rspin - 5.10-rc3 kernel with limited reader spinning patch
 3) no-rspin- 5.10-rc3 kernel with reader spinning disabled

 # of threads  CS Load   Vanilla  Before   limit-rspin   no-rspin
   ---   ---  --   ---   
  21  5,1855,662  5,214   5,077
  41  5,1074,983  5,188   4,760
  81  4,7824,564  4,720   4,628
 161  4,6804,053  4,567   3,402
 321  4,2991,115  1,118   1,098
 641  3,218  983  1,001 957
 961  1,938  944957 930

  2   20  2,0082,128  2,264   1,665
  4   20  1,3901,033  1,046   1,101
  8   20  1,4721,155  1,098   1,213
 16   20  1,3321,077  1,089   1,122
 32   20967  914917 980
 64   20787  874891 858
 96   20730  836847 844

  2  100372  356360 355
  4  100492  425434 392
  8  100533  537529 538
 16  100548  572568 598
 32  100499  520527 537
 64  100466  517526 512
 96  100406  497506 509

The column "CS Load" represents the number of pause instructions issued
in the locking critical section. A CS load of 1 is extremely short and
is not likey in real situations. A load of 20 (moderate) and 100 (long)
are more realistic.

It can be seen that the previous patches in this series have reduced
performance in general except in highly contended cases with moderate
or long critical sections that performance improves a bit. This change
is mostly caused by the "Prevent potential lock starvation" patch that
reduce reader optimistic spinning and hence reduce reader fragmentation.

The patch that further limit reader optimistic spinning doesn't seem to
have too much impact on overall performance as shown in the benchmark
data.

The patch that disables reader optimistic spinning shows reduced
performance at lightly loaded cases, but comparable or slightly better
performance on with heavier contention.

This patch just removes reader optimistic spinning for now. As readers
are not going to do optimistic spinning anymore, we don't need to
consider if the OSQ is empty or not when doing lock stealing.

Signed-off-by: Waiman Long 


Reviewed-by: Davidlohr Bueso 


[PATCH v2 5/5] locking/rwsem: Remove reader optimistic spinning

2020-11-20 Thread Waiman Long
Reader optimistic spinning is helpful when the reader critical section
is short and there aren't that many readers around. It also improves
the chance that a reader can get the lock as writer optimistic spinning
disproportionally favors writers much more than readers.

Since commit d3681e269fff ("locking/rwsem: Wake up almost all readers
in wait queue"), all the waiting readers are woken up so that they can
all get the read lock and run in parallel. When the number of contending
readers is large, allowing reader optimistic spinning will likely cause
reader fragmentation where multiple smaller groups of readers can get
the read lock in a sequential manner separated by writers. That reduces
reader parallelism.

One possible way to address that drawback is to limit the number of
readers (preferably one) that can do optimistic spinning. These readers
act as representatives of all the waiting readers in the wait queue as
they will wake up all those waiting readers once they get the lock.

Alternatively, as reader optimistic lock stealing has already enhanced
fairness to readers, it may be easier to just remove reader optimistic
spinning and simplifying the optimistic spinning code as a result.

Performance measurements (locking throughput kops/s) using a locking
microbenchmark with 50/50 reader/writer distribution and turbo-boost
disabled was done on a 2-socket Cascade Lake system (48-core 96-thread)
to see the impacts of these changes:

  1) Vanilla - 5.10-rc3 kernel
  2) Before  - 5.10-rc3 kernel with previous patches in this series
  2) limit-rspin - 5.10-rc3 kernel with limited reader spinning patch
  3) no-rspin- 5.10-rc3 kernel with reader spinning disabled

  # of threads  CS Load   Vanilla  Before   limit-rspin   no-rspin
    ---   ---  --   ---   
   21  5,1855,662  5,214   5,077
   41  5,1074,983  5,188   4,760
   81  4,7824,564  4,720   4,628
  161  4,6804,053  4,567   3,402
  321  4,2991,115  1,118   1,098
  641  3,218  983  1,001 957
  961  1,938  944957 930

   2   20  2,0082,128  2,264   1,665
   4   20  1,3901,033  1,046   1,101
   8   20  1,4721,155  1,098   1,213
  16   20  1,3321,077  1,089   1,122
  32   20967  914917 980
  64   20787  874891 858
  96   20730  836847 844

   2  100372  356360 355
   4  100492  425434 392
   8  100533  537529 538
  16  100548  572568 598
  32  100499  520527 537
  64  100466  517526 512
  96  100406  497506 509

The column "CS Load" represents the number of pause instructions issued
in the locking critical section. A CS load of 1 is extremely short and
is not likey in real situations. A load of 20 (moderate) and 100 (long)
are more realistic.

It can be seen that the previous patches in this series have reduced
performance in general except in highly contended cases with moderate
or long critical sections that performance improves a bit. This change
is mostly caused by the "Prevent potential lock starvation" patch that
reduce reader optimistic spinning and hence reduce reader fragmentation.

The patch that further limit reader optimistic spinning doesn't seem to
have too much impact on overall performance as shown in the benchmark
data.

The patch that disables reader optimistic spinning shows reduced
performance at lightly loaded cases, but comparable or slightly better
performance on with heavier contention.

This patch just removes reader optimistic spinning for now. As readers
are not going to do optimistic spinning anymore, we don't need to
consider if the OSQ is empty or not when doing lock stealing.

Signed-off-by: Waiman Long 
---
 kernel/locking/lock_events_list.h |   5 +-
 kernel/locking/rwsem.c| 278 +-
 2 files changed, 48 insertions(+), 235 deletions(-)

diff --git a/kernel/locking/lock_events_list.h 
b/kernel/locking/lock_events_list.h
index 270a0d351932..97fb6f3f840a 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -56,12 +56,9 @@ LOCK_EVENT(rwsem_sleep_reader)   /* # of reader sleeps   
*/
 LOCK_EVENT(rwsem_sleep_writer) /* # of writer sleeps   */
 LOCK_EVENT(rwsem_wake_reader)  /* # of reader wakeups  */