On 06/12/2018 12:16 AM, David Sterba wrote:
On Mon, Jun 11, 2018 at 10:50:54AM +0100, Filipe Manana wrote:
        btrfs: replace uuid_mutex by device_list_mutex in
btrfs_open_devices

   *
   * the mutex can be very coarse and can cover long-running operations
   *
   * protects: updates to fs_devices counters like missing devices, rw
devices,
   * seeding, structure cloning, openning/closing devices at mount/umount
time

generates some confusion since btrfs_open_devices(), after that
commit, no longer takes the uuid_mutex and it
updates some fs_devices counters (opened, open_devices, etc).

  As uuid_mutex is a global fs_uuids lock for the per fsid operations
  doesn't make any sense.

  This problem is reproducible only for-4.18, misc-next if fine.
  I am looking deeper.

What about the unprotected updates (increments) to fs_devices->opened
and fs_devices->open_devices?
Other functions are accessing/updating them while holding the uuid mutex.

The goal is to reduce usage of uuid_mutex only to protect search or
update of the fs_uuids list, everything else should be protected by the
device_list_mutex.

The commit 542c5908abfe84f7 (use device_list_mutex in
btrfs_open_devices) implements that but then the access to the ->opened
member is not protected consistently. There are patches that convert the
use to device_list_mutex but haven't been merged due to refinements or
pending review.

At this point I think we should revert the one commit 542c5908abfe84f7
as it introduces the locking problems and revisit the whole fs_devices
locking scheme again in the dex dev cycle. That will be post rc1 as
there might be more to revert.



 I tried to narrow this, it appears some of the things that
 circular locking dependency check report doesn't make sense.
 Here below is what I find.. as of now.

 The test case btrfs/004 can be simplified to.. which also
 reproduces the problem.

---------------------8<-------------
$ cat 165
#! /bin/bash
# FS QA Test No. btrfs/165
#

seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"

here=`pwd`
tmp=/tmp/$$
status=1
noise_pid=0

_cleanup()
{
        wait
        rm -f $tmp.*
}
trap "_cleanup; exit \$status" 0 1 2 3 15

# get standard environment, filters and checks
. ./common/rc
. ./common/filter

# real QA test starts here
_supported_fs btrfs
_supported_os Linux
_require_scratch

rm -f $seqres.full

run_check _scratch_mkfs_sized $((2000 * 1024 * 1024))
run_check _scratch_mount
run_check $FSSTRESS_PROG -d $SCRATCH_MNT -w -p 1 -n 2000 $FSSTRESS_AVOID
run_check _scratch_unmount

echo "done"
status=0
exit
---------------------8<-------------

 The circular locking dependency warning occurs at FSSTRESS_PROG.
 And in particular at doproc() in xfstests/ltp/fsstress.c, randomly
 at any of the command at
 opdesc_t        ops[] = { ..}
 which involves calling mmap file operation and if there is something
 to commit.

 The commit transaction does need device_list_mutex which is also being
 used for the btrfs_open_devices() in the commit 542c5908abfe84f7.

 But btrfs_open_devices() is only called at mount, and mmap() can
 establish only be established after the mount has completed. With
 this give its unclear to me why the circular locking dependency check
 is warning about this.

 I feel until we have clarity about this and also solve other problem
 related to the streamlining of uuid_mutex, I suggest we revert
 542c5908abfe84f7. Sorry for the inconvenience.

Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to