This patchset can be fetched from github:
https://github.com/adam900710/linux/tree/qgroup_delayed_subtree
Which is based on v5.0-rc1.
This patch address the heavy load subtree scan, but delaying it until
we're going to modify the swapped tree block.
The overall workflow is:
1) Record the subtree root block get swapped.
During subtree swap:
O = Old tree blocks
N = New tree blocks
reloc tree file tree X
Root Root
/ \ / \
NA OB OA OB
/ | | \ / | | \
NC ND OE OF OC OD OE OF
In these case, NA and OA is going to be swapped, record (NA, OA) into
file tree X.
2) After subtree swap.
reloc tree file tree X
Root Root
/ \ / \
OA OB NA OB
/ | | \ / | | \
OC OD OE OF NC ND OE OF
3a) CoW happens for OB
If we are going to CoW tree block OB, we check OB's bytenr against
tree X's swapped_blocks structure.
It doesn't fit any one, nothing will happen.
3b) CoW happens for NA
Check NA's bytenr against tree X's swapped_blocks, and get a hit.
Then we do subtree scan on both subtree OA and NA.
Resulting 6 tree blocks to be scanned (OA, OC, OD, NA, NC, ND).
Then no matter what we do to file tree X, qgroup numbers will
still be correct.
Then NA's record get removed from X's swapped_blocks.
4) Transaction commit
Any record in X's swapped_blocks get removed, since there is no
modification to swapped subtrees, no need to trigger heavy qgroup
subtree rescan for them.
[[Benchmark]] (*)
Hardware:
VM 4G vRAM, 8 vCPUs,
disk is using 'unsafe' cache mode,
backing device is SAMSUNG 850 evo SSD.
Host has 16G ram.
Mkfs parameter:
--nodesize 4K (To bump up tree size)
Initial subvolume contents:
4G data copied from /usr and /lib.
(With enough regular small files)
Snapshots:
16 snapshots of the original subvolume.
each snapshot has 3 random files modified.
balance parameter:
-m
So the content should be pretty similar to a real world root fs layout.
And after file system population, there is no other activity, so it
should be the best case scenario.
| v4.20-rc1 | w/ patchset | diff
-----------------------------------------------------------------------
relocated extents | 22615 | 22457 | -0.1%
qgroup dirty extents | 163457 | 121606 | -25.6%
time (sys) | 22.884s | 18.842s | -17.6%
time (real) | 27.724s | 22.884s | -17.5%
*: Due to a bug in v5.0-rc1, balancing metadata with snapshots is
unacceptably slow even with quota disabled. So the result is from
v4.20-rc1.
changelog:
v2:
- Rebase to v4.20-rc1.
- Instead commit transaction after each reloc tree merge, delay it until
merge_reloc_roots() finishes.
This provides a more natural behavior, and reduce the unnecessary
transaction commits.
v3:
- Fix backref walk deadlock by not triggering it at all.
This also removes the need for @exec_post refactor and replace the
patch to allow @old_root unpopulated.
- Include the patch that fixes the unexpected data rsv free.
v3.1:
- Rebased to v4.20-rc1.
Minor conflicts with some cleanup code.
v4:
- Renaming members from "file_*" to "subv_*".
Members like "file_bytenr" is pretty confusing, renaming it to
"subv_bytenr" avoid the confusion.
- Use btrfs_root::reloc_dirty_list to replace dynamic memory allocation
One less point of failure, and no need to worry about GFP_KERNEL/NOFS.
Furthermore, it's easier to manipulate list than rb tree.
Qu Wenruo (7):
btrfs: qgroup: Move reserved data account from btrfs_delayed_ref_head
to btrfs_qgroup_extent_record
btrfs: qgroup: Don't trigger backref walk at delayed ref insert time
btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots()
btrfs: qgroup: Refactor btrfs_qgroup_trace_subtree_swap()
btrfs: qgroup: Introduce per-root swapped blocks infrastructure
btrfs: qgroup: Use delayed subtree rescan for balance
btrfs: qgroup: Cleanup old subtree swap code
fs/btrfs/ctree.c | 8 +
fs/btrfs/ctree.h | 29 +++
fs/btrfs/delayed-ref.c | 39 +---
fs/btrfs/delayed-ref.h | 11 --
fs/btrfs/disk-io.c | 2 +
fs/btrfs/extent-tree.c | 3 -
fs/btrfs/qgroup.c | 356 ++++++++++++++++++++++++-----------
fs/btrfs/qgroup.h | 157 ++++++++++-----
fs/btrfs/relocation.c | 100 +++++++---
fs/btrfs/transaction.c | 1 +
include/trace/events/btrfs.h | 29 ---
11 files changed, 486 insertions(+), 249 deletions(-)
--
2.20.1