On 2019/10/18 上午3:39, David Sterba wrote:
> Signed-off-by: David Sterba <dste...@suse.com>

Great document.

Some questions inlined below.
> ---
>  fs/btrfs/locking.c | 110 +++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 96 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
> index e0e0430577aa..2a0e828b4470 100644
> --- a/fs/btrfs/locking.c
> +++ b/fs/btrfs/locking.c
> @@ -13,6 +13,48 @@
>  #include "extent_io.h"
>  #include "locking.h"
>  
> +/*
> + * Extent buffer locking
> + * ~~~~~~~~~~~~~~~~~~~~~
> + *
> + * The locks use a custom scheme that allows to do more operations than are
> + * available fromt current locking primitives. The building blocks are still
> + * rwlock and wait queues.
> + *
> + * Required semantics:
> + *
> + * - reader/writer exclusion
> + * - writer/writer exclusion
> + * - reader/reader sharing
> + * - spinning lock semantics
> + * - blocking lock semantics
> + * - try-lock semantics for readers and writers
> + * - one level nesting, allowing read lock to be taken by the same thread 
> that
> + *   already has write lock

Any example about this scenario? IIRC there is only one user of nested lock.
Although we know it exists for a long time, I guess it would be better
trying to remove such call sites?

> + *
> + * The extent buffer locks (also called tree locks) manage access to eb data.

One of my concern related to "access to eb data" is, to access some
member, we don't really need any lock at all.

Some members should never change during the lifespan of an eb. E.g.
bytenr, transid.

Some code is already taking advantage of this, like tree-checker
checking the transid without holding a lock.
Not sure if we should take use of this.

> + * We want concurrency of many readers and safe updates. The underlying 
> locking
> + * is done by read-write spinlock and the blocking part is implemented using
> + * counters and wait queues.
> + *
> + * spinning semantics - the low-level rwlock is held so all other threads 
> that
> + *                      want to take it are spinning on it.
> + *
> + * blocking semantics - the low-level rwlock is not held but the counter
> + *                      denotes how many times the blocking lock was held;
> + *                      sleeping is possible

What about an example/state machine of all read/write and
spinning/blocking combination?

Thanks,
Qu

> + *
> + * Write lock always allows only one thread to access the data.
> + *
> + *
> + * Debugging
> + * ~~~~~~~~~
> + *
> + * There are additional state counters that are asserted in various contexts,
> + * removed from non-debug build to reduce extent_buffer size and for
> + * performance reasons.
> + */
> +
>  #ifdef CONFIG_BTRFS_DEBUG
>  static inline void btrfs_assert_spinning_writers_get(struct extent_buffer 
> *eb)
>  {
> @@ -80,6 +122,15 @@ static void btrfs_assert_tree_write_locks_get(struct 
> extent_buffer *eb) { }
>  static void btrfs_assert_tree_write_locks_put(struct extent_buffer *eb) { }
>  #endif
>  
> +/*
> + * Mark already held read lock as blocking. Can be nested in write lock by 
> the
> + * same thread.
> + *
> + * Use when there are potentially long operations ahead so other thread 
> waiting
> + * on the lock will not actively spin but sleep instead.
> + *
> + * The rwlock is released and blocking reader counter is increased.
> + */
>  void btrfs_set_lock_blocking_read(struct extent_buffer *eb)
>  {
>       trace_btrfs_set_lock_blocking_read(eb);
> @@ -96,6 +147,14 @@ void btrfs_set_lock_blocking_read(struct extent_buffer 
> *eb)
>       read_unlock(&eb->lock);
>  }
>  
> +/*
> + * Mark already held write lock as blocking.
> + *
> + * Use when there are potentially long operations ahead so other threads
> + * waiting on the lock will not actively spin but sleep instead.
> + *
> + * The rwlock is released and blocking writers is set.
> + */
>  void btrfs_set_lock_blocking_write(struct extent_buffer *eb)
>  {
>       trace_btrfs_set_lock_blocking_write(eb);
> @@ -127,8 +186,13 @@ void btrfs_set_lock_blocking_write(struct extent_buffer 
> *eb)
>  }
>  
>  /*
> - * take a spinning read lock.  This will wait for any blocking
> - * writers
> + * Lock the extent buffer for read. Wait for any writers (spinning or 
> blocking).
> + * Can be nested in write lock by the same thread.
> + *
> + * Use when the locked section does only lightweight actions and busy waiting
> + * would be cheaper than making other threads do the wait/wake loop.
> + *
> + * The rwlock is held upon exit.
>   */
>  void btrfs_tree_read_lock(struct extent_buffer *eb)
>  {
> @@ -166,9 +230,10 @@ void btrfs_tree_read_lock(struct extent_buffer *eb)
>  }
>  
>  /*
> - * take a spinning read lock.
> - * returns 1 if we get the read lock and 0 if we don't
> - * this won't wait for blocking writers
> + * Lock extent buffer for read, optimistically expecting that there are no
> + * contending blocking writers. If there are, don't wait.
> + *
> + * Return 1 if the rwlock has been taken, 0 otherwise
>   */
>  int btrfs_tree_read_lock_atomic(struct extent_buffer *eb)
>  {
> @@ -188,8 +253,9 @@ int btrfs_tree_read_lock_atomic(struct extent_buffer *eb)
>  }
>  
>  /*
> - * returns 1 if we get the read lock and 0 if we don't
> - * this won't wait for blocking writers
> + * Try-lock for read. Don't block or wait for contending writers.
> + *
> + * Retrun 1 if the rwlock has been taken, 0 otherwise
>   */
>  int btrfs_try_tree_read_lock(struct extent_buffer *eb)
>  {
> @@ -211,8 +277,10 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb)
>  }
>  
>  /*
> - * returns 1 if we get the read lock and 0 if we don't
> - * this won't wait for blocking writers or readers
> + * Try-lock for write. May block until the lock is uncontended, but does not
> + * wait until it is free.
> + *
> + * Retrun 1 if the rwlock has been taken, 0 otherwise
>   */
>  int btrfs_try_tree_write_lock(struct extent_buffer *eb)
>  {
> @@ -233,7 +301,10 @@ int btrfs_try_tree_write_lock(struct extent_buffer *eb)
>  }
>  
>  /*
> - * drop a spinning read lock
> + * Release read lock. Must be used only if the lock is in spinning mode.  If
> + * the read lock is nested, must pair with read lock before the write unlock.
> + *
> + * The rwlock is not held upon exit.
>   */
>  void btrfs_tree_read_unlock(struct extent_buffer *eb)
>  {
> @@ -255,7 +326,11 @@ void btrfs_tree_read_unlock(struct extent_buffer *eb)
>  }
>  
>  /*
> - * drop a blocking read lock
> + * Release read lock, previously set to blocking by a pairing call to
> + * btrfs_set_lock_blocking_read(). Can be nested in write lock by the same
> + * thread.
> + *
> + * State of rwlock is unchanged, last reader wakes waiting threads.
>   */
>  void btrfs_tree_read_unlock_blocking(struct extent_buffer *eb)
>  {
> @@ -279,8 +354,10 @@ void btrfs_tree_read_unlock_blocking(struct 
> extent_buffer *eb)
>  }
>  
>  /*
> - * take a spinning write lock.  This will wait for both
> - * blocking readers or writers
> + * Lock for write. Wait for all blocking and spinning readers and writers. 
> This
> + * starts context where reader lock could be nested by the same thread.
> + *
> + * The rwlock is held for write upon exit.
>   */
>  void btrfs_tree_lock(struct extent_buffer *eb)
>  {
> @@ -307,7 +384,12 @@ void btrfs_tree_lock(struct extent_buffer *eb)
>  }
>  
>  /*
> - * drop a spinning or a blocking write lock.
> + * Release the write lock, either blocking or spinning (ie. there's no need
> + * for an explicit blocking unlock, like btrfs_tree_read_unlock_blocking).
> + * This also ends the context for nesting, the read lock must have been
> + * released already.
> + *
> + * Tasks blocked and waiting are woken, rwlock is not held upon exit.
>   */
>  void btrfs_tree_unlock(struct extent_buffer *eb)
>  {
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to