Hi Paul,
I wrote:
> > The process had many threads active.
>
> It should use 11 threads. You didn't see 100 threads, right?
Looking at the test's code, it should use 21 threads.
1) I measured the execution time of this test, on various distributions,
with various kernels, and various numbers of CPUs (in VirtualBox).
In Pop OS 22.04, Linux 6.8.0 PREEMPT
1 CPU 8 CPUs
2 sec 53 sec
In Pop OS 22.04, Linux 6.9.3 PREEMPT
1 CPU 2 CPUs 4 CPUs 8 CPUs 10 CPUs
2 sec 1.2 sec 2.4 sec 51 sec 103 sec
real 51 sec real 103 sec
user 41 sec user 121 sec
sys 211 sec sys 661 sec
In Ubuntu 22.04, Linux 5.15.0
1 CPU 2 CPUs 4 CPUs 8 CPUs
0.9 sec 0.5 sec 14 sec 65 sec
real 65 sec
user 111 sec
sys 282 sec
In Ubuntu 24.04, Linux 6.8.0
1 CPU 2 CPUs 4 CPUs 8 CPUs
2.1 sec 1.3 sec 3 sec 58 sec
real 58 sec
user 36 sec
sys 270 sec
So, clearly, this test takes a long time for many CPUs, and it is not
specific to a specific kernel version.
2) Gnulib has various implementations of locks, and the unit tests are
all similar. So I compared, on Ubuntu 24.04 with 8 CPUs:
test-pthread-mutex 0.6 sec
test-pthread-rwlock 48 sec
test-lock 4 sec
test-rwlock1 0.2..0.4 sec
test-mtx 0.6 sec
So, the Gnulib rwlocks are fast, but the glibc rwlocks are slow. What's
the difference?
The difference is that the Gnulib rwlocks test whether the rwlocks prefer
writers (at configure time: m4/pthread_rwlock_rdlock.m4) and, if not,
uses a different implementation. On glibc, the Gnulib rwlock use the
libc's functions, just with a different initializer:
PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP
instead of
PTHREAD_RWLOCK_INITIALIZER.
And indeed, when I modify the test-pthread-rwlock to use
PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP
instead of
PTHREAD_RWLOCK_INITIALIZER
it executes fast:
test-pthread-rwlock modified 0.3 sec
3) This topic has been discussed in the glibc bug
https://sourceware.org/bugzilla/show_bug.cgi?id=13701
where I have raised my voice for a writer-preferring implementation.
It was turned down by Torvald Riegel with two arguments
* That a writer-preferring implementation would go against Riegel's
new "scalable" implementation of rwlocks [1].
* That implementation of handling of different priorities was difficult
and therefore, nothing should be changed also for the case of same
priority (as here). [2]
The argument [1] does not make sense to me in view of the timings above.
The argument [2] never made sense (to me at least).
4) The time to login and shutdown (i.e. more precisely from boot to the
login screen, and from shutdown command to VM termination) is pretty slow
with 8 or 10 CPUs, but not with few CPUs. It could be caused by this rwlock
problem, or by the kernel's scheduler, I don't know.
So, in summary, it's a glibc bug that has been closed as "WORKSFORME" and
will never be fixed [3].
In the test-pthread-rwlock test, we cannot just use
PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP, because the *purpose* of
the test is to check the behaviour of the rwlocks with the POSIX-specified
API, not with some alternative API.
Bruno
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=13701#c7
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=13701#c3
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=13701#c14