On 06/12/2018 12:16 AM, David Sterba wrote:
On Mon, Jun 11, 2018 at 10:50:54AM +0100, Filipe Manana wrote:
btrfs: replace uuid_mutex by device_list_mutex in
btrfs_open_devices
*
* the mutex can be very coarse and can cover long-running operations
*
* protects: updates to fs_devices counters like missing devices, rw
devices,
* seeding, structure cloning, openning/closing devices at mount/umount
time
generates some confusion since btrfs_open_devices(), after that
commit, no longer takes the uuid_mutex and it
updates some fs_devices counters (opened, open_devices, etc).
As uuid_mutex is a global fs_uuids lock for the per fsid operations
doesn't make any sense.
This problem is reproducible only for-4.18, misc-next if fine.
I am looking deeper.
What about the unprotected updates (increments) to fs_devices->opened
and fs_devices->open_devices?
Other functions are accessing/updating them while holding the uuid mutex.
The goal is to reduce usage of uuid_mutex only to protect search or
update of the fs_uuids list, everything else should be protected by the
device_list_mutex.
The commit 542c5908abfe84f7 (use device_list_mutex in
btrfs_open_devices) implements that but then the access to the ->opened
member is not protected consistently. There are patches that convert the
use to device_list_mutex but haven't been merged due to refinements or
pending review.
At this point I think we should revert the one commit 542c5908abfe84f7
as it introduces the locking problems and revisit the whole fs_devices
locking scheme again in the dex dev cycle. That will be post rc1 as
there might be more to revert.
I tried to narrow this, it appears some of the things that
circular locking dependency check report doesn't make sense.
Here below is what I find.. as of now.
The test case btrfs/004 can be simplified to.. which also
reproduces the problem.
---------------------8<-------------
$ cat 165
#! /bin/bash
# FS QA Test No. btrfs/165
#
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
here=`pwd`
tmp=/tmp/$$
status=1
noise_pid=0
_cleanup()
{
wait
rm -f $tmp.*
}
trap "_cleanup; exit \$status" 0 1 2 3 15
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
# real QA test starts here
_supported_fs btrfs
_supported_os Linux
_require_scratch
rm -f $seqres.full
run_check _scratch_mkfs_sized $((2000 * 1024 * 1024))
run_check _scratch_mount
run_check $FSSTRESS_PROG -d $SCRATCH_MNT -w -p 1 -n 2000 $FSSTRESS_AVOID
run_check _scratch_unmount
echo "done"
status=0
exit
---------------------8<-------------
The circular locking dependency warning occurs at FSSTRESS_PROG.
And in particular at doproc() in xfstests/ltp/fsstress.c, randomly
at any of the command at
opdesc_t ops[] = { ..}
which involves calling mmap file operation and if there is something
to commit.
The commit transaction does need device_list_mutex which is also being
used for the btrfs_open_devices() in the commit 542c5908abfe84f7.
But btrfs_open_devices() is only called at mount, and mmap() can
establish only be established after the mount has completed. With
this give its unclear to me why the circular locking dependency check
is warning about this.
I feel until we have clarity about this and also solve other problem
related to the streamlining of uuid_mutex, I suggest we revert
542c5908abfe84f7. Sorry for the inconvenience.
Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html