On Thu, Oct 10, 2019 at 10:39:28AM +0800, Qu Wenruo wrote: > The overall idea of the new BG_TREE is pretty simple: > Put BLOCK_GROUP_ITEMS into a separate tree. > > This brings one obvious enhancement: > - Reduce mount time of large fs > > Although it could be possible to accept BLOCK_GROUP_ITEMS in either > trees (extent root or bg root), I'll leave that kernel convert as > alternatives to offline convert, as next step if there are a lot of > interests in that. > > So for now, if an existing fs want to take advantage of BG_TREE feature, > btrfs-progs will provide offline convertion tool. > > [[Benchmark]] > Physical device: NVMe SSD > VM device: VirtIO block device, backup by sparse file > Nodesize: 4K (to bump up tree height) > Extent data size: 4M > Fs size used: 1T > > All file extents on disk is in 4M size, preallocated to reduce space usage > (as the VM uses loopback block device backed by sparse file) > > Without patchset: > Use ftrace function graph: > > 7) | open_ctree [btrfs]() { > 7) | btrfs_read_block_groups [btrfs]() { > 7) @ 805851.8 us | } > 7) @ 911890.2 us | } > > btrfs_read_block_groups() takes 88% of the total mount time, > > With patchset, and use -O bg-tree mkfs option: > > 6) | open_ctree [btrfs]() { > 6) | btrfs_read_block_groups [btrfs]() { > 6) * 91204.69 us | } > 6) @ 192039.5 us | } > > open_ctree() time is only 21% of original mount time. > And btrfs_read_block_groups() only takes 47% of total open_ctree() > execution time. > > The reason is pretty obvious when considering how many tree blocks needs > to be read from disk: > - Original extent tree: > nodes: 55 > leaves: 1025 > total: 1080 > - Block group tree: > nodes: 1 > leaves: 13 > total: 14 > > Not to mention all the tree blocks readahead works pretty fine for bg > tree, as we will read every item. > While readahead for extent tree will just be a diaster, as all block > groups are scatter across the whole extent tree. > > The reduction of mount time is already obvious even on super fast NVMe > disk with memory cache. > It would be even more obvious if the fs is on spinning rust. > > Signed-off-by: Qu Wenruo <w...@suse.com>
You need to add fs_info->bg_root->block_rsv = &fs_info->delayed_refs_rsv; to btrfs_init_global_block_rsv, otherwise bad things will happen. Thanks, Josef