On 5/3/18 2:23 AM, Nikolay Borisov wrote: > > > On 3.05.2018 00:11, je...@suse.com wrote: >> From: Jeff Mahoney <je...@suse.com> >> >> Hi Dave - >> >> Here's the updated patchset for the rescan races. This fixes the issue >> where we'd try to start multiple workers. It introduces a new "ready" >> bool that we set during initialization and clear while queuing the worker. >> The queuer is also now responsible for most of the initialization. >> >> I have a separate patch set start that gets rid of the racy mess surrounding >> the rescan worker startup. We can handle it in btrfs_run_qgroups and >> just set a flag to start it everywhere else. > I'd be interested in seeing those patches. Some time ago I did send a > patch which cleaned up the way qgroup rescan was initiated. It was done > from "btrfs_run_qgroups" and I think this is messy. Whatever we do we > ought to really have well-defined semantics when qgroups rescan are run, > preferably we shouldn't be conflating rescan + run (unless there is > _really_ good reason to do). In the past the rescan from scan was used > only during qgroup enabling.
I think btrfs_run_qgroups is the place to do it. Here's why: 2773 int 2774 btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info) 2775 { 2776 int ret = 0; 2777 struct btrfs_trans_handle *trans; 2778 2779 ret = qgroup_rescan_init(fs_info, 0, 1); 2780 if (ret) 2781 return ret; 2782 2783 /* 2784 * We have set the rescan_progress to 0, which means no more 2785 * delayed refs will be accounted by btrfs_qgroup_account_ref. 2786 * However, btrfs_qgroup_account_ref may be right after its call 2787 * to btrfs_find_all_roots, in which case it would still do the 2788 * accounting. 2789 * To solve this, we're committing the transaction, which will 2790 * ensure we run all delayed refs and only after that, we are 2791 * going to clear all tracking information for a clean start. 2792 */ 2793 2794 trans = btrfs_join_transaction(fs_info->fs_root); 2795 if (IS_ERR(trans)) { 2796 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN; 2797 return PTR_ERR(trans); 2798 } 2799 ret = btrfs_commit_transaction(trans); 2800 if (ret) { 2801 fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN; 2802 return ret; 2803 } 2804 2805 qgroup_rescan_zero_tracking(fs_info); 2806 2807 queue_rescan_worker(fs_info); 2808 return 0; 2809 } The delayed ref race should exist anywhere we initiate a rescan outside of initially enabling qgroups. We already zero the tracking and queue the rescan worker in btrfs_run_qgroups for when we enable qgroups. Why not just always queue the worker there so the initialization and execution has a clear starting point? There are a few other races I'd like to fix as well. We call btrfs_run_qgroups directly from btrfs_ioctl_qgroup_assign, which is buggy since btrfs_add_qgroup_relation only checks to see if the quota_root exists. It will exist as soon as btrfs_quota_enable runs but we won't have committed the transaction yet. The call will end up enabling quotas in the middle of a transaction. -Jeff -- Jeff Mahoney SUSE Labs
signature.asc
Description: OpenPGP digital signature