On 12.12.2017 09:34, Qu Wenruo wrote:
> [Overall]
> The previous rework on qgroup reservation system put a lot of effort on
> data, which works quite fine.
> 
> But it takes less focus on metadata reservation, causing some problem
> like metadata reservation underflow and noisy kernel warning.
> 
> This patchset will try to address the remaining problem of metadata
> reservation.
> 
> The idea of new qgroup metadata reservation is to use 2 types of
> metadata reservation:
> 1) Per-transaction reservation
>    Life span will be inside a transaction. Will be freed at transaction
>    commit time.
> 
> 2) Preallocated reservation
>    For case where we reserve space before starting a transaction.
>    Operation like dealloc and delayed-inode/item belongs to this type.
> 
>    This works similar to block_rsv, its reservation can be
>    reserved/released at any timing caller like.
> 
>    The only point to notice is, if preallocated reservation is used and
>    finished without problem, it should be converted to per-transaction
>    type instead of just freeing.
>    This is to co-operate with qgroup update at commit time.
> 
> For preallocated type, this patch will integrate them into inode_rsv
> mechanism reworked by Josef, and delayed-inode/item reservation.
> 
> 
> [Problem: Over-reserve]
> Currently the patchset addresses metadata underflow quite well, but
> due to the over-reserve nature of btrfs and highly bounded to inode_rsv,
> qgroup metadata reservation also tends to be over-reserved.
> 
> This is especially obvious for small limit.
> For 128M limit, it's will only be able to write about 70M before hitting
> quota limit.
> Although for larger limit, like 5G limit, it can reach 4.5G or more
> before hitting limit.
> 
> Such over-reserved behavior can lead to some problem with existing test
> cases (where limit is normally less than 20M).
> 
> While it's also possible to be addressed by use more accurate space other
> than max estimations.
> 
> For example, to calculate metadata needed for delalloc, we use
> btrfs_calc_trans_metadata_size(), which always try to reserve space for
> CoW a full-height tree, and will also include csum size.
> Both calculate is way over-killed for qgroup metadata reservation.
In private chat with Chris couple of months ago we discussed making the
reservation a lot less pessimistic. One assumption which we could
exploit is the fact that upon a tree split it's unlikely we will create
more than 1 additional level in the tree. So we could potentially modify
btrfs_calc_trans_metadata_size to take a root parameter and instead of

BTRFS_MAX_LEVEL * 2 we could change this to root_level * 2. How does
that sound?


> 
> [Patch structure]
> The patch is consist of 2 main parts:
> 1) Type based qgroup reservation
>    The original patchset is sent several months ago.
>    Nothing is modified at all, just rebased. And not conflict at all.
> 
>    It's from patch 1 to patch 6.
> 
> 2) Split meta qgroup reservation into per-trans and prealloc sub types
>    The real work to address metadata underflow.
>    Due to the over-reserve problem, this part is still in RFC state.
>    But the framework should mostly be fine, only needs extra fine-tuning
>    to get more accurate qgroup rsv to avoid too early limit.
> 
>    It's from patch 7 to 14.
> 
> Qu Wenruo (14):
>   btrfs: qgroup: Skeleton to support separate qgroup reservation type
>   btrfs: qgroup: Introduce helpers to update and access new qgroup rsv
>   btrfs: qgroup: Make qgroup_reserve and its callers to use separate
>     reservation type
>   btrfs: qgroup: Fix wrong qgroup reservation update for relationship
>     modification
>   btrfs: qgroup: Update trace events to use new separate rsv types
>   btrfs: qgroup: Cleanup the remaining old reservation counters
>   btrfs: qgroup: Split meta rsv type into meta_prealloc and
>     meta_pertrans
>   btrfs: qgroup: Don't use root->qgroup_meta_rsv for qgroup
>   btrfs: qgroup: Introduce function to convert META_PREALLOC into
>     META_PERTRANS
>   btrfs: qgroup: Use separate meta reservation type for delalloc
>   btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and
>     item
>   btrfs: qgroup: Use root->qgroup_meta_rsv_* to record qgroup meta
>     reserved space
>   btrfs: qgroup: Update trace events for metadata reservation
>   Revert "btrfs: qgroups: Retry after commit on getting EDQUOT"
> 
>  fs/btrfs/ctree.h             |  15 +-
>  fs/btrfs/delayed-inode.c     |  50 +++++--
>  fs/btrfs/disk-io.c           |   2 +-
>  fs/btrfs/extent-tree.c       |  49 +++---
>  fs/btrfs/file.c              |  15 +-
>  fs/btrfs/free-space-cache.c  |   2 +-
>  fs/btrfs/inode-map.c         |   4 +-
>  fs/btrfs/inode.c             |  27 ++--
>  fs/btrfs/ioctl.c             |  10 +-
>  fs/btrfs/ordered-data.c      |   2 +-
>  fs/btrfs/qgroup.c            | 350 
> ++++++++++++++++++++++++++++++++-----------
>  fs/btrfs/qgroup.h            | 102 ++++++++++++-
>  fs/btrfs/relocation.c        |   9 +-
>  fs/btrfs/transaction.c       |   8 +-
>  include/trace/events/btrfs.h |  73 ++++++++-
>  15 files changed, 537 insertions(+), 181 deletions(-)
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to