On 2018年01月29日 19:21, Nikolay Borisov wrote: > > > On 29.01.2018 13:09, Qu Wenruo wrote: >> >> >> On 2018年01月29日 15:44, Nikolay Borisov wrote: >>> Running generic/019 with qgroups on the scratch device enabled is >>> almost guaranteed to trigger the BUG_ON in btrfs_free_tree_block. It's >>> supposed to trigger only on -ENOMEM, in reality, however, it's possible >>> to get -EIO from btrfs_qgroup_trace_extent_post. This function just >>> finds the roots of the extent being tracked and sets the qrecord->old_roots >>> list. If this operation fails nothing critical happens except the >>> quota accounting can be considered wrong. In such case just set the >>> INCONSISTENT flag for the quota and print a warning. >>> >>> Signed-off-by: Nikolay Borisov <[email protected]> >>> --- >>> >>> V2: >>> * Always print a warning if btrfs_qgroup_trace_extent_post fails >>> * Set quota inconsistent flag if btrfs_qgroup_trace_extent_post fails >>> >>> fs/btrfs/delayed-ref.c | 7 +++++-- >>> fs/btrfs/qgroup.c | 6 ++++-- >>> 2 files changed, 9 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c >>> index a1a40cf382e3..5b2789a28a13 100644 >>> --- a/fs/btrfs/delayed-ref.c >>> +++ b/fs/btrfs/delayed-ref.c >>> @@ -820,8 +820,11 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info >>> *fs_info, >>> num_bytes, parent, ref_root, level, action); >>> spin_unlock(&delayed_refs->lock); >>> >>> - if (qrecord_inserted) >>> - return btrfs_qgroup_trace_extent_post(fs_info, record); >>> + if (qrecord_inserted) { >>> + int ret = btrfs_qgroup_trace_extent_post(fs_info, record); >>> + if (ret < 0) >>> + btrfs_warn(fs_info, "Error accounting new delayed refs >>> extent (err code: %d). Quota inconsistent", ret); >> >> Sorry that I didn't point it out in previous review, there are 2 callers >> in delayed-ref.c using btrfs_qgroup_trace_extent_post(). >> >> One is the one you're fixing, and the other one is >> btrfs_add_delayed_data_ref(). > > Yes, but the callers of btrfs_add_delayed_data_ref seem to be expecting > error values and actually handling them.
Not exactly.
A quick search leads to extra unhandled btrfs_add_delayed_data_ref().
walk_down_proc()
|- btrfs_dec_ref()
|- __btrfs_mod_ref()
|- btrfs_free_extent()
|- btrfs_add_delayed_data_ref()
|- btrfs_qgroup_trace_extent_post()
And this leads to another BUG_ON().
> So a failure doesn't
> necessarily mean the fs is in inconsistent state.
But at the cost of aborting current transaction.
>
>>
>> So it would be even better if the warning message can be integrated into
>> btrfs_qgroup_trace_extent_post().
>
> See below why I don't think integrating the warning is a good idea. In
> the case being modified in this patch we will continue operating
> normally, hence the warning and INCONSISTENT flag make sense.
>
>>
>> Also btrfs_qgroup_trace_extent_post() also needs to ignore the return
>> value from btrfs_qgroup_trace_extent_post().
>
> I don't think so, if we are able to handle failures as is the case in
> the delayed_data_ref case then we might abort the current transaction
> and this should leave the fs in a consistent state.
Here comes the trade-off.
Keep the on-disk data consistent while abort current transaction and
make fs read-only.
VS
Make the fs continue running while just discard the qgroup data.
Although the truth is, either way we may eventually goes
abort_transaction() since we failed to read some tree blocks.
But since there are still some BUG_ON() wondering around the wild, the
latter one seems a little better.
> In that case even
> the "STATUS_FLAG_INCONSISTENT" being set in qgroup_trace_extent_post
> might be "wrong" in the sense that a failure from this function doesn't
> necessarily make the quota inconsistent if upper layers can handle the
> failures and revert their work.
Well, most of them just abort the transaction and leads to above trade-off.
Unfortunately, there is not much thing we can do in error handler. :(
Thanks,
Qu
> So I'm now starting to think that the
> inconsistent flag should be set in add_delayed_tree_ref, but this sort
> of leaks the qgroup implementation detail into the delayed tree ref
> function...
>>
>> Thanks,
>> Qu
>>
>>> + }
>>> return 0;
>>>
>>> free_head_ref:
>>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>>> index b2ab5f795816..33f9dba44e92 100644
>>> --- a/fs/btrfs/qgroup.c
>>> +++ b/fs/btrfs/qgroup.c
>>> @@ -1440,8 +1440,10 @@ int btrfs_qgroup_trace_extent_post(struct
>>> btrfs_fs_info *fs_info,
>>> int ret;
>>>
>>> ret = btrfs_find_all_roots(NULL, fs_info, bytenr, 0, &old_root, false);
>>> - if (ret < 0)
>>> + if (ret < 0) {
>>> + fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT;
>>> return ret;
>>> + }
>>>
>>> /*
>>> * Here we don't need to get the lock of
>>> @@ -2933,7 +2935,7 @@ static int __btrfs_qgroup_release_data(struct inode
>>> *inode,
>>> if (free && reserved)
>>> return qgroup_free_reserved_data(inode, reserved, start, len);
>>> extent_changeset_init(&changeset);
>>> - ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start,
>>> + ret = clear_record_extent_bits(&BTRFS_I(inode)->io_tree, start,
>>> start + len -1, EXTENT_QGROUP_RESERVED, &changeset);
>>> if (ret < 0)
>>> goto out;
>>>
>>
signature.asc
Description: OpenPGP digital signature
