On 2018-02-12 10:37, Ellis H. Wilson III wrote:
On 02/11/2018 01:24 PM, Hans van Kranenburg wrote:
Why not just use `btrfs fi du <subvol> <snap1> <snap2>` now and then and
update your administration with the results? .. Instead of putting the
burden of keeping track of all administration during every tiny change
all day long?

I will look into that if using built-in group capacity functionality proves to be truly untenable.  Thanks!
As a general rule, unless you really need to actively prevent a subvolume from exceeding it's quota, this will generally be more reliable and have much less performance impact than using qgroups.

CoW is still valuable for us as we're shooting to support on the order
of hundreds of snapshots per subvolume,

Hundreds will get you into trouble even without qgroups.

I should have been more specific.  We are looking to use up to a few dozen snapshots per subvolume, but will have many (tens to hundreds of) discrete subvolumes (each with up to a few dozen snapshots) in a BTRFS filesystem.  If I have it wrong and the scalability issues in BTRFS do not solely apply to subvolumes and their snapshot counts, please let me know.
The issue isn't so much total number of snapshots as it is how many snapshots are sharing data. If each of your individual subvolumes shares no data with any of the others via reflinks (so no deduplication across subvolumes, and no copying files around using reflinks or the clone ioctl), then I would expect things will be just fine without qgroups provided that you're not deleting huge numbers of snapshots at the same time.

With qgroups involved, I really can't say for certain, as I've never done much with them myself, but based on my understanding of how it all works, I would expect multiple subvolumes with a small number of snapshots each to not have as many performance issues as a single subvolume with the same total number of snapshots.

I will note you focused on my tiny desktop filesystem when making some of your previous comments -- this is why I didn't want to share specific details.  Our filesystem will be RAID0 with six large HDDs (12TB each). Reliability concerns do not apply to our situation for technical reasons, but if there are capacity scaling issues with BTRFS I should be made aware of, I'd be glad to hear them.  I have not seen any in technical documentation of such a limit, and experiments so far on 6x6TB arrays has not shown any performance problems, so I'm inclined to believe the only scaling issue exists with reflinks.  Correct me if I'm wrong.
BTRFS in general works fine at that scale, dependent of course on the level of concurrent access you need to support. Each tree update needs to lock a bunch of things in the tree itself, and having large numbers of clients writing to the same set of files concurrently can cause lock contention issues because of this, especially if all of them are calling fsync() or fdatasync() regularly. These issues can be mitigated by segregating workloads into their own subvolumes (each subvolume is a mostly independent filesystem tree), but it sounds like you're already doing that, so I don't think that would be an issue for you.

The only other possibility I can think of is that the performance hit from qgroups may scale not just based on the number of snapshots of a given subvolume, but also the total size of the subvolume (more data means more accounting work), though I'm not certain about that (it's just a hunch based on what I do know about qgroups).

Now, there are some other odd theoretical cases that may cause issues when dealing with really big filesystems, but they're either really specific edge cases (for example, starting with a really small filesystem and gradually scaling it up in size as it gets full) or happen at scales far larger than what you're talking about (on the order of at least double digit petabyte scale).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to