On Mon, Apr 16, 2018 at 02:10:08PM +0800, Anand Jain wrote:
> 
> 
> On 04/04/2018 02:34 AM, David Sterba wrote:
> > Replace a WARN_ON with a proper check and message in case something goes
> > really wrong and resumed balance cannot set up its exclusive status.
> 
> > The check is a user friendly assertion, I don't expect to ever happen
> > under normal circumstances.
> > 
> > Also document that the paused balance starts here and owns the exclusive
> > op status.
> 
> 
> > Signed-off-by: David Sterba <dste...@suse.com>
> > ---
> >   fs/btrfs/volumes.c | 16 ++++++++++++++--
> >   1 file changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index eb78c1d0ce2b..843982a2cbdb 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -3982,6 +3982,20 @@ int btrfs_recover_balance(struct btrfs_fs_info 
> > *fs_info)
> >     struct btrfs_key key;
> >     int ret;
> >   
> > +   /*
> > +    * This should never happen, as the paused balance state is recovered
> > +    * during mount without any chance of other exclusive ops to collide.
> > +    * Let it fail early and do not continue mount.
> > +    *
> > +    * Otherwise, this gives the exclusive op status to balance and keeps
> > +    * in paused state until user intervention (cancel or umount).
> > +    */
> > +   if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
> > +           btrfs_err(fs_info,
> > +                   "cannot set exclusive op status to resume balance");
> > +           return -EINVAL;
> > +   }
> >
> 
>   We need the test_and_set_bit(BTRFS_FS_EXCL_OP) only if we confirm that
>   there is a pending balance. Its better to test and set at the same
>   place as WARN_ON before.

Right.  And the patch is wrong, because we need to set up
fs_info::balance_ctl in all cases, otherwise unpausing balance would not
work as expected. The only change should be a better message and maybe
not even that, as it's just preparing the state but the exclusive op is
claimed later in btrfs_resume_balance_async.

I'll revisit how the error handling is done after resuming balance or
dev-replace, this is considered a hard failure (mount, rw remount) but I
don't think it's necessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to