Re: btrfs-cleaner / snapshot performance analysis

Ellis H. Wilson III Sat, 10 Feb 2018 10:30:05 -0800

Thank you very much for your response Hans. Comments in-line, but I didwant to handle one miscommunication straight-away:

I'm a huge fan of BTRFS. If I came off like I was complaining, mysincere apologies. To be completely transparent we are using BTRFS ina very large project at my company, which I am lead architect on, andwhile I have read the academic papers, perused a subset of the sourcecode, and been following it's development in the background, I now needto deeply understand where there might be performance hiccups. All ofour foreground I/O testing with BTRFS in RAID0/RAID1/single acrossdifferent SSDs and HDDs has been stellar, but we haven't dug too farinto snapshot performance, balancing, and other more background-orientedperformance. Hence my interest in finding documentation and analysis Ican read and grok myself on the implications of snapshot operations onforeground I/O if such exists. More in-line below:


On 02/09/2018 03:36 PM, Hans van Kranenburg wrote:

This has proven thus far less than helpful, as
the response tends to be "use less snapshots," or "disable quotas," both
of which strike me as intellectually unsatisfying answers


Well, sometimes those answers help. :) "Oh, yes, I disabled qgroups, I
didn't even realize I had those, and now the problem is gone."

I meant less than helpful for me, since for my project I need detailedand fairly accurate capacity information per sub-volume, and therelationship between qgroups and subvolume performance wasn't beingspelled out in the responses. Please correct me if I am wrong aboutneeding qgroups enabled to see detailed capacity informationper-subvolume (including snapshots).

the former in a filesystem where snapshots are supposed to be
"first-class citizens."


Throwing complaints around is also not helpful.

Sorry about this. It wasn't directed in any way at BTRFS developers,but rather some of the suggestions for solution proposed in randomforums online. As mentioned I'm a fan of BTRFS, especially as myproject requires the snapshots to truly be first-class citizens in thatthey are writable and one can roll-back to them at-will, unlike in ZFSand other filesystems. I was just saying it seemed backwards to suggesthaving less snapshots was a solution in a filesystem where thearchitecture appears to treat them as a core part of the design.

The "performance implications" are highly dependent on your specific
setup, kernel version, etc, so it really makes sense to share:

* kernel version
* mount options (from /proc/mounts|grep btrfs)
* is it ssd? hdd? iscsi lun?
* how big is the FS
* how many subvolumes/snapshots? (how many snapshots per subvolume)

I will answer the above, but would like to reiterate my previous commentthat I still would like to understand the fundamental relationships hereas in my project kernel version is very likely to change (to morerecent), along with mount options and underlying device media. Oncethis project hits the field I will additionally have limited controlover how large the FS gets (until physical media space is exhausted ofcourse) or how many subvolumes/snapshots there are. If I know thatabove N snapshots per subvolume performance tanks by M%, I can applylimits on the use-case in the field, but I am not aware of those kindsof performance implications yet.


My present situation is the following:
- Fairly default opensuse 42.3.

- uname -a: Linux betty 4.4.104-39-default #1 SMP Thu Jan 4 08:11:03 UTC2018 (7db1912) x86_64 x86_64 x86_64 GNU/Linux- /dev/sda6 / btrfsrw,relatime,ssd,space_cache,subvolid=259,subvol=/@/.snapshots/1/snapshot 0 0(I have about 10 other btrfs subvolumes, but this is the only one beingsnapshotted)- At the time of my noticing the slow-down, I had about 24 snapshots, 10of which were in the process of being deleted

- Usage output:
~> sudo btrfs filesystem usage /
Overall:
    Device size:                  40.00GiB
    Device allocated:             11.54GiB
    Device unallocated:           28.46GiB
    Device missing:                  0.00B
    Used:                          7.57GiB
    Free (estimated):             32.28GiB      (min: 32.28GiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:               28.44MiB      (used: 0.00B)
Data,single: Size:11.01GiB, Used:7.19GiB
   /dev/sda6      11.01GiB
Metadata,single: Size:512.00MiB, Used:395.91MiB
   /dev/sda6     512.00MiB
System,single: Size:32.00MiB, Used:16.00KiB
   /dev/sda6      32.00MiB
Unallocated:
   /dev/sda6      28.46GiB

And what's essential to look at is what your computer is doing while you
are throwing a list of subvolumes into the cleaner.

* is it using 100% cpu?
* is it showing 100% disk read I/O utilization?
* is it showing 100% disk write I/O utilization? (is it writing lots and
lots of data to disk?)

I noticed the problem when Thunderbird became completely unresponsive.I fired up top, and btrfs-cleaner was at the top, along with snapper.btrfs-cleaner was at 100% cpu (single-core) for the entirety of thetime. I knew I had about 24 snapshots prior to this, and after about60s when the pain subsided only about 14 remained, so I estimate 10 weredeleted as part of snapper's cleaning algorithm. I quickly also randstat during the slow-down, and after 5s it finally started and reportedonly about 3-6MB/s in terms of read and write to the drive in question.

I have since run top and dstat before running snapper cleaner manually,and the system lock-up does still occur, albeit for shorter times asI've only done it with a few snapshots and not much changed in each.


Best,

ellis
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs-cleaner / snapshot performance analysis

Reply via email to