On Wed, Jun 17, 2015 at 3:36 PM, Jeff Mahoney <[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 6/17/15 10:32 AM, Jeff Mahoney wrote:
>> On 6/17/15 9:24 AM, Filipe David Manana wrote:
>>> On Wed, Jun 17, 2015 at 11:04 AM, Filipe David Manana
>>> <[email protected]> wrote:
>>>> On Mon, Jun 15, 2015 at 2:41 PM,  <[email protected]> wrote:
>>>>> From: Jeff Mahoney <[email protected]>
>>>>>
>>>>> The cleaner thread may already be sleeping by the time we
>>>>> enter close_ctree.  If that's the case, we'll skip removing
>>>>> any unused block groups queued for removal, even during a
>>>>> normal umount. They'll be cleaned up automatically at next
>>>>> mount, but users expect a umount to be a clean
>>>>> synchronization point, especially when used on
>>>>> thin-provisioned storage with -odiscard.  We also explicitly
>>>>> remove unused block groups in the ro-remount path for the
>>>>> same reason.
>>>>>
>>>>> Signed-off-by: Jeff Mahoney <[email protected]>
>>>> Reviewed-by: Filipe Manana <[email protected]> Tested-by:
>>>> Filipe Manana <[email protected]>
>>>>
>>>>> --- fs/btrfs/disk-io.c |  9 +++++++++ fs/btrfs/super.c   |
>>>>> 11 +++++++++++ 2 files changed, 20 insertions(+)
>>>>>
>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index
>>>>> 2ef9a4b..2e47fef 100644 --- a/fs/btrfs/disk-io.c +++
>>>>> b/fs/btrfs/disk-io.c @@ -3710,6 +3710,15 @@ void
>>>>> close_ctree(struct btrfs_root *root)
>>>>> cancel_work_sync(&fs_info->async_reclaim_work);
>>>>>
>>>>> if (!(fs_info->sb->s_flags & MS_RDONLY)) { +               /*
>>>>> + * If the cleaner thread is stopped and there are + * block
>>>>> groups queued for removal, the deletion will be + * skipped
>>>>> when we quit the cleaner thread. +                */ +
>>>>> mutex_lock(&root->fs_info->cleaner_mutex); +
>>>>> btrfs_delete_unused_bgs(root->fs_info); +
>>>>> mutex_unlock(&root->fs_info->cleaner_mutex); + ret =
>>>>> btrfs_commit_super(root); if (ret) btrfs_err(fs_info,
>>>>> "commit super ret %d", ret); diff --git a/fs/btrfs/super.c
>>>>> b/fs/btrfs/super.c index 9e66f5e..2ccd8d4 100644 ---
>>>>> a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1539,6
>>>>> +1539,17 @@ static int btrfs_remount(struct super_block *sb,
>>>>> int *flags, char *data)
>>>>>
>>>>> sb->s_flags |= MS_RDONLY;
>>>>>
>>>>> +               /* +                * Setting MS_RDONLY will
>>>>> put the cleaner thread to +                * sleep at the
>>>>> next loop if it's already active. +                * If it's
>>>>> already asleep, we'll leave unused block +                *
>>>>> groups on disk until we're mounted read-write again +
>>>>> * unless we clean them up here. +                */ +
>>>>> mutex_lock(&root->fs_info->cleaner_mutex); +
>>>>> btrfs_delete_unused_bgs(fs_info); +
>>>>> mutex_unlock(&root->fs_info->cleaner_mutex);
>>
>>> So actually, this allows for a deadlock after the patch I sent
>>> out last week:
>>
>>> https://patchwork.kernel.org/patch/6586811/
>>
>>> In that patch delete_unused_bgs is no longer called under the
>>> cleaner_mutex, and making it so, will cause a deadlock with/ru
>>> relocation.
>>
>>> Even without that patch, I don't think you need using this mutex
>>>  anyway - no 2 tasks running this function can get the same bg
>>> from the fs_info->unused_bgs list.
>>
>> I was hitting crashes during umount when xfstests would do
>> remount-ro and umount in quick succession.  I can go back and
>> confirm this, but I believe I was encountering a race between the
>> cleaner thread and umount after being set read-only.  It didn't
>> trigger all the time.  My hypothesis is that if the cleaner thread
>> was running and had a lot of work to do, it could start before set
>> MS_RDONLY and still be performing work through the remount and into
>> the umount.  Ro-remount would have set MS_RDONLY so we skip the
>> btrfs_super_commit in close_ctree and then blow up afterwards.
>>
>> Taking the cleaner mutex means we either wait until the cleaner
>> thread has finished or we put it to sleep on the next loop before
>> it does anything.  In either case, it's safe.  It could just has
>> easily been:
>>
>> mutex_lock(&root->fs_info->cleaner_mutex);
>> mutex_unlock(&root->fs_info->cleaner_mutex);
>>
>> btrfs_delete_unused_bgs(fs_info);
>>
>> I think it actually was in a previous version I was testing.  It
>> probably should go back to that version so that we don't end up
>> confusing it with the new mutex you introduced in your patch.
>
> It looks like your:
> [PATCH] Btrfs: fix crash on close_ctree() if cleaner starts new
> transaction
>
> would also fix this in a more general case.  We can drop taking the
> cleaner mutex here.

Cool, thanks Jeff.

>
> - -Jeff
>
> - --
> Jeff Mahoney
> SUSE Labs
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
>
> iQIcBAEBAgAGBQJVgYYDAAoJEB57S2MheeWyxIcQAIGwFvP1bL4C8Oa3WyFL/tjE
> QITNDQZGYXEKfFqRWdHEAeFJ8kv234xo/tx7Ml0Txd8DFrqzDwXSxv6deLzDiiTT
> gymMdBKO3x7TLKZTxnyDXYEUDHM72IMOUS2el3wOOsc61rL1KajFEWySGtAA80pk
> bIUH6uosRTXhpXBRe080mc9XPhtfIQyCC8nroJHYazNwT3VWrvbhDaZPM3npNttj
> 5glsCz7ieseiWKqFCIlYC5yCgpst79U7D8M75Jo0yslvtZNpZOMR3YhvyQakj5hG
> p/CFRfbdFGnl3wKv+ACyu7XlewqoA9LwkB5Sbjzd4XbS3n7J4gch043b+BbIl2SA
> VghNTTI+tm7KKvMa3fghtedooVYu6DjdhU58VEWOBtHaDiWntSmd0FqzUCqAotxC
> fwEmMWCWCWR1E0etRUrnbO1DGltkR38ost7cvXOPXUUdvv3Hy22mTfWW73YwsWXW
> kwmG2V+IdgOWHDMxQCnj55/NbYep+/TiVjDPJnOuCn8tD5Tw+zHxtRbXhVcyKpGj
> jJXKb9uxDhKpsisz8HQJHf1uMLFJ3qzCqgYxysbc2PqlzylFfY2aefYWSPmrE6y4
> 6OJW7gTr75PzrGGm7gM1sPiPQLuFNEFBEi0Ak7ad6Q6SAAV339r+h00sg4Q1adVu
> 2JedYHUeFDUjAGAgft0G
> =SQ/a
> -----END PGP SIGNATURE-----



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to