Re: [PATCH 4/5] btrfs: serialize blocking_writers updates

Nikolay Borisov Wed, 23 Oct 2019 02:57:16 -0700

On 17.10.19 г. 22:39 ч., David Sterba wrote:
> There's one potential memory ordering problem regarding
> eb::blocking_writers, although this hasn't been hit in practice. On TSO
> (total store ordering) architectures, the writes will always be seen in
> right order.
> 
> In case the calls in btrfs_set_lock_blocking_write are seen in this
> (wrong) order on another CPU:
> 
> 0:  /* blocking_writers == 0 */
> 1:  write_unlock()
> 2:  blocking_writers = 1
> 
> btrfs_try_tree_read_lock, btrfs_tree_read_lock_atomic or
> btrfs_try_tree_write_lock would have to squeeze all of this between 1:
> and 2:

This is only a problem for unlocked (optimistic) accesses in those
functions. Simply because from its POV the eb->lock doesn't play any
role e.g. they don't know about it at all.

This implies there needs to be yet another synchronization/ordering
mechanism only for blocking_writer. But then further down you say that
there are no read side barrier because observing the accesses in a
particular order is not important for correctness.

Given this I fail to see what this smp_wmb affects ordering.
> 
> - check if blocking_writers is 0 (it is, continue)
> - try read lock, read lock or write lock (succeeds after 1:)
> - check blocking_writers again (continue)
> 
> All of that assumes that the reordered writes can survive for quite some
> time (unlikely if its in the internal store buffer).
> 
> Another point against observing that in practice is that
> blocking_writers and the lock are on the same cacheline (64B), so it's
> more likely both values are stored in order, or some sort of pending
> write flush will update blocking_writers, rwlock before the try lock
> happens. Assuming the CPUs work like that and don't hide other
> surprises.
> 
> struct extent_buffer {
>       u64                        start;                /*     0     8 */
>       long unsigned int          len;                  /*     8     8 */
>       long unsigned int          bflags;               /*    16     8 */
>       struct btrfs_fs_info *     fs_info;              /*    24     8 */
>       spinlock_t                 refs_lock;            /*    32     4 */
>       atomic_t                   refs;                 /*    36     4 */
>       atomic_t                   io_pages;             /*    40     4 */
>       int                        read_mirror;          /*    44     4 */
>       struct callback_head callback_head __attribute__((__aligned__(8))); /*  
>   48    16 */
>       /* --- cacheline 1 boundary (64 bytes) --- */
>       pid_t                      lock_owner;           /*    64     4 */
>       int                        blocking_writers;     /*    68     4 */
>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
>       atomic_t                   blocking_readers;     /*    72     4 */
>       bool                       lock_nested;          /*    76     1 */
> 
>       /* XXX 1 byte hole, try to pack */
> 
>       short int                  log_index;            /*    78     2 */
>       rwlock_t                   lock;                 /*    80     8 */
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
>       wait_queue_head_t          write_lock_wq;        /*    88    24 */
>       wait_queue_head_t          read_lock_wq;         /*   112    24 */
>       /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
>       struct page *              pages[16];            /*   136   128 */
> 
>       /* size: 264, cachelines: 5, members: 18 */
>       /* sum members: 263, holes: 1, sum holes: 1 */
>       /* forced alignments: 1 */
>       /* last cacheline: 8 bytes */
> } __attribute__((__aligned__(8)));
> 
> Add the barriers for correctness sake.
> 
> Signed-off-by: David Sterba <dste...@suse.com>
> ---
>  fs/btrfs/locking.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
> index a12f3edc3505..e0e0430577aa 100644
> --- a/fs/btrfs/locking.c
> +++ b/fs/btrfs/locking.c
> @@ -110,6 +110,18 @@ void btrfs_set_lock_blocking_write(struct extent_buffer 
> *eb)
>               btrfs_assert_spinning_writers_put(eb);
>               btrfs_assert_tree_locked(eb);
>               WRITE_ONCE(eb->blocking_writers, 1);
> +             /*
> +              * Writers must be be updated before the lock is released so
> +              * that other threads see that in order. Otherwise this could
> +              * theoretically lead to blocking_writers be still set to 1 but
> +              * this would be unexpected eg. for spinning locks.
> +              *
> +              * There are no pairing read barriers for unlocked access as we
> +              * don't strictly need them for correctness (eg. in
> +              * btrfs_try_tree_read_lock), and the unlock/lock is an implied
> +              * barrier in other cases.
> +              */
> +             smp_wmb();
>               write_unlock(&eb->lock);

That comment sounds to me as if you only care about the readers of
blocking_writers _after_ they have acquired the eb::lock for reading. In
this case this smp_wmb is pointless because write_unlock/write_lock
imply release/acquire semantics.

Unlocked readers are not affected by this wmb.

>       }
>  }
> @@ -316,7 +328,9 @@ void btrfs_tree_unlock(struct extent_buffer *eb)
>               /*
>                * We need to order modifying blocking_writers above with
>                * actually waking up the sleepers to ensure they see the
> -              * updated value of blocking_writers
> +              * updated value of blocking_writers.
> +              *
> +              * cond_wake_up calls smp_mb
>                */
>               cond_wake_up(&eb->write_lock_wq);
>       } else {
>
Re: [PATCH 4/5] btrfs: serialize blocking_writers updates

Reply via email to