On Tue, Jun 14, 2016 at 05:26:22PM +0800, Qu Wenruo wrote:
> When balancing data extents, qgroup will leak all its numbers for
> balanced data extents.
> 
> The root cause is the non-standard extent reference update used in
> balance code.
> 
> The problem happens in the following steps:
> (Use 4M as original data extent size, and 257 as src root objectid)
> 
> 1) Balance create new data extents and increase its refs
> 
> Balance will alloc new data extent and create new EXTENT_DATA in data
> reloc tree, while its refernece is increased with 2 different
> referencer: 257 and data reloc tree.
> 
> While at that time, file tree is still referring to old extents.
> 
> Extent bytenr   |    Real referencer    |     backrefs     |
> ------------------------------------------------------------
> New             | Data reloc            | Data reloc + 257 | << Mismatch
> Old             | 257                   | 257              |
> 
> Qgroup number: 4M + metadata
> 
> 2) Commit trans before merging reloc tree
> Then we goes to prepare_to_merge(), which will commit transacation.
> 
> In the qgroup update codes inside commit_transaction, although backref
> walk codes find the new data extents has 2 backref, but file tree
> backref can't find referencer(file tree is still referring to old
> extents), and data reloc doesn't count as file tree.
> 
> Extent bytenr   | nr_old_roots | nr_new_roots | qgroup change |
> --------------------------------------------------------------|
> New             | 0            | 0            | 0             |
> Old             | 1            | 1            | 0             |
> 
> Qgroup number: 4M + metadata +-0 = 4M + metadata
> 
> 3) Swap tree blocks and free old tree blocks
> Then we goes to merge_reloc_roots(), which swaps the tree blocks
> directly, and free the old tree blocks.
> Freeing tree blocks will also free its data extents, this goes through
> normal routine, and qgroup handles it well, decreasing the numbers.
> 
> And since new data extent is not updated here (updated in step 1), so
> qgroup won't scan new data extent.
> 
> Extent bytenr   | nr_old_roots | nr_new_roots | qgroup change |
> --------------------------------------------------------------|
> New             |-No modification, doesn't go through qgroup--|
> Old             | 1            | 0            | -4M           |
> 
> Qgroup number: 4M + metadata -4M = metadata
> 
> This patch will fix it by re-dirtying new extent at step 3), so backref
> walk and qgroup can get correct result.
> 
> And thanks to the new qgroup framework, we don't need to check whether
> it is needed to dirty some file extents. Even some unrelated extents are
> re-dirtied, qgroup can handle it quite well.
> 
> So we only need to ensure we don't miss some extents.
> 
> Reported-by: Mark Fasheh <[email protected]>
> Reported-by: Filipe Manana <[email protected]>
> Signed-off-by: Qu Wenruo <[email protected]>

Tested-by: Goldwyn Rodrigues <[email protected]>


> ---
>  fs/btrfs/relocation.c | 94 
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 94 insertions(+)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0477dca..f1d696d 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -31,6 +31,7 @@
>  #include "async-thread.h"
>  #include "free-space-cache.h"
>  #include "inode-map.h"
> +#include "qgroup.h"
>  
>  /*
>   * backref_node, mapping_node and tree_block start with this
> @@ -1750,6 +1751,78 @@ int memcmp_node_keys(struct extent_buffer *eb, int 
> slot,
>  }
>  
>  /*
> + * Helper function to fixup screwed qgroups caused by increased extent ref,
> + * which doesn't follow normal extent ref update behavior.
> + * (Correct behavior is, increase extent ref and modify source root in
> + *  one trans)
> + * No better solution as long as we're doing swapping trick to do balance.
> + */
> +static int qgroup_redirty_data_extents(struct btrfs_trans_handle *trans,
> +                                    struct btrfs_root *root, u64 bytenr,
> +                                    u64 gen)
> +{
> +     struct btrfs_fs_info *fs_info = root->fs_info;
> +     struct extent_buffer *leaf;
> +     struct btrfs_delayed_ref_root *delayed_refs;
> +     int slot;
> +     int ret = 0;
> +
> +     if (!fs_info->quota_enabled || !is_fstree(root->objectid))
> +             return 0;
> +     if (WARN_ON(!trans))
> +             return -EINVAL;
> +
> +     delayed_refs = &trans->transaction->delayed_refs;
> +
> +     leaf = read_tree_block(root, bytenr, gen);
> +     if (IS_ERR(leaf)) {
> +             return PTR_ERR(leaf);
> +     } else if (!extent_buffer_uptodate(leaf)) {
> +             ret = -EIO;
> +             goto out;
> +     }
> +
> +     /* We only care leaf, which may contains EXTENT_DATA */
> +     if (btrfs_header_level(leaf) != 0)
> +             goto out;
> +
> +     for (slot = 0; slot < btrfs_header_nritems(leaf); slot++) {
> +             struct btrfs_key key;
> +             struct btrfs_file_extent_item *fi;
> +             struct btrfs_qgroup_extent_record *record;
> +             struct btrfs_qgroup_extent_record *exist;
> +
> +             btrfs_item_key_to_cpu(leaf, &key, slot);
> +             if (key.type != BTRFS_EXTENT_DATA_KEY)
> +                     continue;
> +             fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
> +             if (btrfs_file_extent_type(leaf, fi) ==
> +                 BTRFS_FILE_EXTENT_INLINE ||
> +                 btrfs_file_extent_disk_bytenr(leaf, fi) == 0)
> +                     continue;
> +
> +             record = kmalloc(sizeof(*record), GFP_NOFS);
> +             if (!record) {
> +                     ret = -ENOMEM;
> +                     break;
> +             }
> +             record->bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
> +             record->num_bytes = btrfs_file_extent_disk_num_bytes(leaf, fi);
> +             record->old_roots = NULL;
> +
> +             spin_lock(&delayed_refs->lock);
> +             exist = btrfs_qgroup_insert_dirty_extent(delayed_refs, record);
> +             spin_unlock(&delayed_refs->lock);
> +             if (exist)
> +                     kfree(record);
> +     }
> +out:
> +     free_extent_buffer(leaf);
> +     return ret;
> +
> +}
> +
> +/*
>   * try to replace tree blocks in fs tree with the new blocks
>   * in reloc tree. tree blocks haven't been modified since the
>   * reloc tree was create can be replaced.
> @@ -1919,7 +1992,28 @@ again:
>                                       0);
>               BUG_ON(ret);
>  
> +             /*
> +              * Fix up the screwed up qgroups
> +              *
> +              * For tree blocks with extent data, new data extent's
> +              * backref is already increased with both file tree and data
> +              * reloc tree.
> +              * While trans is committed before we modify file tree.
> +              *
> +              * This makes qgroup can't account new data extents well,
> +              * as the file tree is still referring to old extents, not
> +              * new extents, backref walk will find the new extent only
> +              * referred by data reloc tree.
> +              * So qgroup is screwed up and didn't increase its ref/excl.
> +              *
> +              * Fix it up by re-dirtying qgroup record for data extents in
> +              * new tree blocks
> +              */
> +             ret = qgroup_redirty_data_extents(trans, dest, new_bytenr,
> +                                               new_ptr_gen);
>               btrfs_unlock_up_safe(path, 0);
> +             if (ret < 0)
> +                     break;
>  
>               ret = level;
>               break;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to