Paul Eggert wrote:
> > 1. Install a toolchain for creating binaries linked against musl-libc:
> > $ sudo apt install musl-gcc
> > 2. Create a testdir for the module 'pthread-rwlock'.
> > 3. Build it with CC="gcc". You should still see the test failure after
> > 10 minutes.
> > 4. Build it with CC="musl-gcc". Run
> > $ time gltests/test-pthread-rwlock
> >
> > If 4. takes more than 30 seconds, your OS is probably running on top of
> > a hypervisor, and that is causing the problem.
> >
> > If 4. takes just a few seconds, it looks like a glibc bug.
>
> 4. takes just a few seconds on my platform.
Thanks. So, it looks like a glibc bug.
I have investigated this from three different angles.
1) In maint-tools/ I have added a directory test-programs/pthread-rwlock/
with a test program that analyzes how an implementation of POSIX rwlocks
handles the wait queue.
This is based on the same idea as m4/pthread_rwlock_rdlock.m4, but where
that extracts only a single bit of information about one particular
situation, the new test program exercises many situations, analyzes them,
and provides a textual summary.
Find here attached the results of this analysis.
glibc ≥ 2.25 with the default pthread_rwlock_t initializer is particular:
It is the *only* implementation for which the result contains the statement
"This implementation always prefers readers."
2) I have redone the timings of most platforms, each in a VirtualBox VM with
paravirtualization set to "minimal" (so as to avoid the bug mentioned in
<https://lists.gnu.org/archive/html/bug-gnulib/2024-06/msg00318.html>).
Find here attached the timings.
While there is sometimes a big difference between the columns "1 CPU" and
"2 CPUs", it can be attributed to single-thread optimizations. Therefore,
what we need to look at are the columns "2 CPUs", "4 CPUs", "8 CPUs".
While some platforms (Cygwin, Solaris 11, gnulib on mingw) show a moderate
increase of the times in this progression, the result is:
glibc ≥ 2.25 with the default pthread_rwlock_t initializer is particular:
It is the *only* implementation for which the times in this progression
grow by a factor of more than 10.
So far, there is a correlation between the analysis results and the timings.
Is this correlation a causal relation?
3) To investigate this question, I modified the gnulib implementation on mingw
to prefer readers. Namely, in lib/windows-rwlock.c, lib/windows-timedrwlock.c
in function glwthread_[timed]rwlock_unlock I changed the line
if (lock->waiting_writers.count > 0)
to
if (lock->waiting_writers.count > 0 && lock->waiting_readers.count == 0)
As expected, the analysis results are:
"This implementation always prefers readers."
And the timings are:
gnulib on mingw, modified to prefer readers
1 CPU 2 CPUs 4 CPUs 8 CPUs
0.787 2.962 385.999 > 7200
That is, a steep increase in running time from 2 CPUs to 8 CPUs.
This provides enough justification for gnulib to override the glibc behaviour
here. I will therefore go ahead and override
PTHREAD_RWLOCK_INITIALIZER
pthread_rwlock_init
pthread_rwlock_attr_init
to use PREFER_WRITER or PREFER_WRITER_NONRECURSIVE, which both behave in a
sane way even in glibc ≥ 2.25.
We knew about the problem since 2018
<https://lists.gnu.org/archive/html/bug-gnulib/2018-06/msg00026.html>,
but had not realized the impact that it has on systems with 8 or more CPUs.
Thanks for your report, Paul!
Bruno
How do the implementations work?
glibc/Linux <2.25 default, glibc/Linux PREFER_WRITER, AIX, Android
When releasing the last reader lock:
If at least one of the enqueued lock attempts is for reading, the
first one of them is granted.
Otherwise, the first of the waiting write attempts is granted.
When releasing a writer lock:
If at least one of the enqueued lock attempts is for writing, the
first one of them is granted.
Otherwise, one of the waiting read attempts is granted.
This implementation does not globally prefer readers, only when releasing
a reader lock.
This implementation does not globally prefer writers, only when releasing
a writer lock.
glibc/Linux PREFER_WRITER_NONRECURSIVE, FreeBSD, Cygwin
When releasing the last reader lock:
The first of the enqueued lock attempts is granted.
When releasing a writer lock:
If at least one of the enqueued lock attempts is for writing, the
first one of them is granted.
Otherwise, one of the waiting read attempts is granted.
This implementation does not prefer readers.
This implementation does not globally prefer writers, only when releasing
a writer lock.
glibc/Linux ≥2.25 default
When releasing the last reader lock:
If at least one of the enqueued lock attempts is for reading, the
first one of them is granted.
Otherwise, the first of the waiting write attempts is granted.
When releasing a writer lock:
If at least one of the enqueued lock attempts is for reading, one of
them is granted.
Otherwise, the first of the waiting write attempts is granted.
This implementation always prefers readers.
This implementation does not prefer writers.
glibc/Hurd
When releasing the last reader lock:
If at least one of the enqueued lock attempts is for reading, the
first one of them is granted.
Otherwise, the latest (LIFO!) waiting write attempt is granted.
When releasing a writer lock:
If at least one of the enqueued lock attempts is for writing, the
latest (LIFO!) one of them is granted.
Otherwise, the latest (LIFO!) of the waiting read attempts is granted.
This implementation does not globally prefer readers, only when releasing
a reader lock.
This implementation does not globally prefer writers, only when releasing
a writer lock.
This implementation is deterministic.
musl, OpenBSD
When releasing the last reader lock:
If at least one of the enqueued lock attempts is for reading, the
first one of them is granted.
Otherwise, the first of the waiting write attempts is granted.
When releasing a writer lock:
The first of the enqueued lock attempts is granted.
This implementation does not globally prefer readers, only when releasing
a reader lock.
This implementation does not prefer writers.
This implementation is deterministic.
macOS
When releasing the last reader lock:
The first of the enqueued lock attempts is granted.
When releasing a writer lock:
If at least one of the enqueued lock attempts is for writing, the
first of them is granted.
Otherwise, one of the waiting read attempts is granted.
This implementation does not prefer readers.
This implementation does not prefer writers.
NetBSD
When releasing the last reader lock:
The first of the enqueued lock attempts is granted.
When releasing a writer lock:
If at least one of the enqueued lock attempts is for writing, the
first one of them is granted.
Otherwise, the latest (LIFO!) of the waiting read attempts is granted.
This implementation does not prefer readers.
This implementation does not globally prefer writers, only when releasing
a writer lock.
This implementation is deterministic.
Solaris 10
When releasing the last reader lock:
If at least one of the enqueued lock attempts is for writing, the
latest (LIFO!) one of them is granted.
When releasing a writer lock:
If at least one of the enqueued lock attempts is for writing, one of
the waiting write attempts is granted (not necessarily the first one).
Otherwise, one of the waiting read attempts is granted.
This implementation does not prefer readers.
This implementation always prefers writers.
Solaris 11
When releasing the last reader lock:
If at least one of the enqueued lock attempts is for writing, one
of them is granted.
When releasing a writer lock:
If at least one of the enqueued lock attempts is for writing, one of
the waiting write attempts is granted (not necessarily the first one).
Otherwise, one of the waiting read attempts is granted.
This implementation does not prefer readers.
This implementation always prefers writers.
gnulib on native Windows
When releasing the last reader lock:
The first of the enqueued lock attempts is granted.
When releasing a writer lock:
If at least one of the enqueued lock attempts is for writing, the
first one of them is granted.
Otherwise, the first of the waiting read attempts is granted.
This implementation does not prefer readers.
This implementation does not globally prefer writers, only when releasing
a writer lock.
This implementation is deterministic.
gnulib on native Windows, modified to prefer readers
When releasing the last reader lock:
If at least one of the enqueued lock attempts is for reading, the
first one of them is granted.
Otherwise, the first of the waiting write attempts is granted.
When releasing a writer lock:
If at least one of the enqueued lock attempts is for reading, the
first of them is granted.
Otherwise, the first of the waiting write attempts is granted.
This implementation always prefers readers.
This implementation does not prefer writers.
This implementation is deterministic.
Timings of test-pthread-rwlock, created from
./gnulib-tool --create-testdir --dir=../testdir1 --single-configure
pthread-rwlock
in a VirtualBox VM (minimal paravirtualization) with 1, 2, 4, 8 CPUs.
Each value is the average of the real time of 3 runs of
"time ./test-pthread-rwlock", in seconds.
glibc/Linux <2.25 default (Debian 9.6)
1 CPU 2 CPUs 4 CPUs 8 CPUs
0.712 0.884 0.589 0.970
glibc/Linux PREFER_WRITER (Ubuntu 22.04)
1 CPU 2 CPUs 4 CPUs 8 CPUs
0.983 2.009 1.238 1.683
glibc/Linux PREFER_WRITER_NONRECURSIVE (Ubuntu 22.04)
1 CPU 2 CPUs 4 CPUs 8 CPUs
1.022 2.289 1.141 0.868
FreeBSD (FreeBSD 14.0)
1 CPU 2 CPUs 4 CPUs 8 CPUs
1.62 1.40 1.14 1.33
Cygwin (Cygwin 2.9.0)
1 CPU 2 CPUs 4 CPUs 8 CPUs
1.094 18.683 36.945 43.999
glibc/Linux ≥2.25 default (Ubuntu 22.04)
1 CPU 2 CPUs 4 CPUs 8 CPUs
1.032 1.998 3.515 67.596
glibc/Hurd
1 CPU 2 CPUs 4 CPUs 8 CPUs
0.433 0.520 0.713 0.630
musl (Alpine Linux 3.20)
1 CPU 2 CPUs 4 CPUs 8 CPUs
1.633 2.413 1.390 0.843
OpenBSD (OpenBSD 7.5)
1 CPU 2 CPUs 4 CPUs 8 CPUs
0.833 2.837 2.757 2.853
NetBSD (NetBSD 10.0)
1 CPU 2 CPUs 4 CPUs 8 CPUs
6.600 7.297 6.993 7.967
Solaris 11 (Solaris 11.4)
1 CPU 2 CPUs 4 CPUs 8 CPUs
1.997 19.697 22.924 28.039
gnulib on mingw
1 CPU 2 CPUs 4 CPUs 8 CPUs
0.763 9.769 15.542 15.954
gnulib on mingw, modified to prefer readers
1 CPU 2 CPUs 4 CPUs 8 CPUs
0.787 2.962 385.999 > 7200