On Mon, Apr 30, 2018 at 05:48:45PM +0800, Anand Jain wrote:
> We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance(),
> which is not called during the remount. So when resuming from the
> paused balance we hit BUG.
> 
>  kernel: kernel BUG at fs/btrfs/volumes.c:3890!
>  ::
>  kernel:  balance_kthread+0x51/0x60 [btrfs]
>  kernel:  kthread+0x111/0x130
>  ::
>  kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8
> 
> Reproducer:
>   On a mounted BTRFS.
> 
>   btrfs balance start --full-balance /btrfs
>   btrfs balance pause /btrfs
>   mount -o remount,ro /dev/sdb /btrfs
>   mount -o remount,rw /dev/sdb /btrfs
> 
> To fix this set the BTRFS_BALANCE_RESUME flag in btrfs_resume_balance_async()
> instead of btrfs_recover_balance().
> 
> Signed-off-by: Anand Jain <anand.j...@oracle.com>
> ---
> v1->v2: btrfs_resume_balance_async() can be called only from remount or
> mount, we don't need to hold fs_info->balance_lock.

For mount it's ok, there's nothing that can run in parallel, but remount
ro->rw that resumes the balance can be run with the ioctl that checks
balance status in parallel, at any time. As the fs_info::balance_ctl is
set up, btrfs_ioctl_balance_progress will proceed and return the current
status.


> Strictly speaking we should rather keep the balance at the paused state
> unless it is resumed by the user again, that means neither mount nor
> remount-rw should resume the balance automatically, former case needs
> writing balance status to the disk. Which needs compatibility
> verification.  So for now just avoid BUG. 
> 
>  fs/btrfs/volumes.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 3e6983a169c4..64bcaf25908b 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4115,6 +4115,8 @@ int btrfs_resume_balance_async(struct btrfs_fs_info 
> *fs_info)
>               return 0;
>       }
>  
> +     fs_info->balance_ctl->b_flags |= BTRFS_BALANCE_RESUME;

Though there's no update operation on flags, I think it's still better
to add the locks that set the resume status. And a comment why it's
here.

> +
>       tsk = kthread_run(balance_kthread, fs_info, "btrfs-balance");
>       return PTR_ERR_OR_ZERO(tsk);
>  }
> @@ -4156,7 +4158,6 @@ int btrfs_recover_balance(struct btrfs_fs_info *fs_info)
>  
>       bctl->fs_info = fs_info;
>       bctl->flags = btrfs_balance_flags(leaf, item);
> -     bctl->flags |= BTRFS_BALANCE_RESUME;

Also, it does not hurt to leave this here, as it logically matches what
the function does: we found the balance item, thus we set the status
accordingly.

Please update and resend, I'd like to push this to 4.17-rc as it's a
crash fix with a reproducer. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to