On Thu, Sep 3, 2015 at 2:05 AM, Justin Maggard <jmaggar...@gmail.com> wrote:
> v2: Fix stupid error while making formatting changes...
>
> I was hitting a consistent NULL pointer dereference during shutdown that
> showed the trace running through end_workqueue_bio().  I traced it back to
> the endio_meta_workers workqueue being poked after it had already been
> destroyed.
>
> Eventually I found that the root cause was a qgroup rescan that was still
> in progress while we were stopping all the btrfs workers.
>
> Currently we explicitly pause balance and scrub operations in
> close_ctree(), but we do nothing to stop the qgroup rescan.  We should
> probably be doing the same for qgroup rescan, but that's a much larger
> change.  This small change is good enough to allow me to unmount without
> crashing.
>
> Signed-off-by: Justin Maggard <jmagg...@netgear.com>
> ---
>  fs/btrfs/qgroup.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index d904ee1..5bfcee9 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2278,7 +2278,7 @@ static void btrfs_qgroup_rescan_worker(struct 
> btrfs_work *work)
>                 goto out;
>
>         err = 0;
> -       while (!err) {
> +       while (!err && !btrfs_fs_closing(fs_info)) {
>                 trans = btrfs_start_transaction(fs_info->fs_root, 0);
>                 if (IS_ERR(trans)) {
>                         err = PTR_ERR(trans);
> @@ -2301,7 +2301,8 @@ out:
>         btrfs_free_path(path);
>
>         mutex_lock(&fs_info->qgroup_rescan_lock);
> -       fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
> +       if (!btrfs_fs_closing(fs_info))
> +               fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN;
>
>         if (err > 0 &&
>             fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT) {
> @@ -2330,7 +2331,9 @@ out:
>         }
>         btrfs_end_transaction(trans, fs_info->quota_root);
>
> -       if (err >= 0) {
> +       if (btrfs_fs_closing(fs_info)) {
> +               btrfs_info(fs_info, "qgroup scan paused");
> +       } else if (err >= 0) {
>                 btrfs_info(fs_info, "qgroup scan completed%s",
>                         err > 0 ? " (inconsistency flag cleared)" : "");
>         } else {

Justin, this is still racy (however much less racy than before).

Once we leave the loop because of the condition
btrfs_fs_closing(fs_info), we start a transaction and do some write
operation on the quota btree. While or before we do such write
operation, close_ctree() might have completed or be at a point where
such write operation will result in another null pointer dereference,
or accessing some dangling pointer, or leak a transaction that never
gets committed (because close_ctree() already stopped the transaction
kthread), etc, etc.

So in addition to what you did, you need to call
btrfs_qgroup_wait_for_completion(fs_info) at disk-io.c:close_ctree()
right after setting fs_info->closing to 1.

Otherwise it looks good.
Thanks.


> --
> 2.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to