On Thu, Apr 27, 2023 at 03:19:12PM +0100, Richard W.M. Jones wrote:
> We've long known that nbdkit-memory-plugin with the default sparse
> array allocator suffers because a global lock must be taken whenever
> any read or write operation is performed.  This commit aims to safely
> improve performance by converting the lock into a read-write lock.

I searched for other approaches, and found this to be an interesting read:

https://stackoverflow.com/questions/2407558/pthreads-reader-writer-locks-upgrading-read-lock-to-write-lock

One of the suggestions there is using fcntl() locking (which does
support upgrade attempts) instead of pthread_rwlock*.  Might be
interesting to code up and compare an approach along those lines.

> +++ b/common/allocators/sparse.c
> @@ -129,11 +129,20 @@ DEFINE_VECTOR_TYPE (l1_dir, struct l1_entry);
>  struct sparse_array {
>    struct allocator a;           /* Must come first. */
>  
> -  /* This lock is highly contended.  When hammering the memory plugin
> -   * with 8 fio threads, about 30% of the total system time was taken
> -   * just waiting for this lock.  Fixing this is quite hard.
> +  /* The shared (read) lock must be held if you just want to access
> +   * the data without modifying any of the L1/L2 metadata or
> +   * allocating or freeing any page.
> +   *
> +   * To modify the L1/L2 metadata including allocating or freeing any
> +   * page you must hold the exclusive (write) lock.
> +   *
> +   * Because POSIX rwlocks are not upgradable this presents a problem.
> +   * We solve it below by speculatively performing the request while
> +   * holding the shared lock, but if we run into an operation that
> +   * needs to update the metadata, we restart the entire request
> +   * holding the exclusive lock.
>     */
> -  pthread_mutex_t lock;
> +  pthread_rwlock_t lock;

But as written, this does look like a nice division of labor that
reduces serialization in the common case.  And yes, it does look like
posix_rwlock is a bit heavier-weight than bare mutex, which explains
the slowdown on the read-heavy workload even while improving the
write-heavy workload.

Reviewed-by: Eric Blake <ebl...@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org
_______________________________________________
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs

Reply via email to