Re: Feature requests: online backup - defrag - change RAID level

webmaster Wed, 11 Sep 2019 13:01:44 -0700


Quoting "Austin S. Hemmelgarn" <ahferro...@gmail.com>:

On 2019-09-11 13:20, webmas...@zedlx.com wrote:
Quoting "Austin S. Hemmelgarn" <ahferro...@gmail.com>:
On 2019-09-10 19:32, webmas...@zedlx.com wrote:
Quoting "Austin S. Hemmelgarn" <ahferro...@gmail.com>:
=== I CHALLENGE you and anyone else on this mailing list: ===
- Show me an exaple where splitting an extent requiresunsharing, and this split is needed to defrag.
Make it clear, write it yourself, I don't want any machine-made outputs.
Start with the above comment about all writes unsharing the regionbeing written to.
Now, extrapolating from there:
Assume you have two files, A and B, each consisting of 64filesystem blocks in single shared extent. Now assume somebodywrites a few bytes to the middle of file B, right around theboundary between blocks 31 and 32, and that you get similar writesto file A straddling blocks 14-15 and 47-48.
After all of that, file A will be 5 extents:

* A reflink to blocks 0-13 of the original extent.
* A single isolated extent consisting of the new blocks 14-15
* A reflink to blocks 16-46 of the original extent.
* A single isolated extent consisting of the new blocks 47-48
* A reflink to blocks 49-63 of the original extent.

And file B will be 3 extents:

* A reflink to blocks 0-30 of the original extent.
* A single isolated extent consisting of the new blocks 31-32.
* A reflink to blocks 32-63 of the original extent.
Note that there are a total of four contiguous sequences of blocksthat are common between both files:
* 0-13
* 16-30
* 32-46
* 49-63
There is no way to completely defragment either file withoutsplitting the original extent (which is still there, just notfully referenced by either file) unless you rewrite the whole fileto a new single extent (which would, of course, completely unsharethe whole file). In fact, if you want to ensure that those sharedregions stay reflinked, there's no way to defragment either filewithout _increasing_ the number of extents in that file (eitherfile would need 7 extents to properly share only those 4 regions),and even then only one of the files could be fully defragmented.
Such a situation generally won't happen if you're just dealingwith read-only snapshots, but is not unusual when dealing withregular files that are reflinked (which is not an uncommonsituation on some systems, as a lot of people have `cp` aliased toreflink things whenever possible).
Well, thank you very much for writing this example. Your example iscertainly not minimal, as it seems to me that one write to the fileA and one write to file B would be sufficient to prove your point,so there we have one extra write in the example, but that's OK.
Your example proves that I was wrong. I admit: it is impossible toperfectly defrag one subvolume (in the way I imagined it should bedone).Why? Because, as in your example, there can be files within aSINGLE subvolume which share their extents with each other. Ididn't consider such a case.
On the other hand, I judge this issue to be mostly irrelevant. Why?Because most of the file sharing will be between subvolumes, notwithin a subvolume.

Not necessarily. Even ignoring the case of data deduplication (whichneeds to be considered if you care at all about enterprise usage,and is part of the whole point of using a CoW filesystem), there areexisting applications that actively use reflinks, either directly orindirectly (via things like the `copy_file_range` system call), andthe number of such applications is growing.

The same argument goes here: If data-deduplication was performed, thenthe user has specifically requested it.Therefore, since it was user's will, the defrag has to honor it, andso the defrag must not unshare deduplicated extents because the userwants them shared. This might prevent a perfect defrag, but that isexactly what the user has requested, either directly or indirectly, bysome policy he has choosen.

If an application actively creates reflinked-copies, then we canassume it does so according to user's will, therefore it is also acommand by user and defrag should honor it by not unsharing and bybeing imperfect.

Now, you might point out that, in case of data-deduplication, we nowhave a case where most sharing might be within-subvolume, invalidatingmy assertion that most sharing will be between-subvolumes. But this isan invalid (more precisely, irelevant) argument. Why? Because thedefrag operation has to focus on doing what it can do, while honoringuser's will. All within-subvolume sharing is user-requested, thereforeit cannot be part of the argument to unshare.

You can't both perfectly defrag and honor deduplication. Therefore,the defrag has to do the best possible thing while still honoringuser's will. <<<!!! So, the fact that the deduplication was performedis actually the reason FOR not unsharing, not against it, as you madeit look in that paragraph. !!!>>>

If the system unshares automatically after deduplication, then theuser will need to run deduplication again. Ridiculous!

When a user creates a reflink to a file in the same subvolume, heis willingly denying himself the assurance of a perfect defrag.Because, as your example proves, if there are a few writes to BOTHfiles, it gets impossible to defrag perfectly. So, if the usercreates such reflinks, it's his own whish and his own fault.

The same argument can be made about snapshots. It's an invalidargument in both cases though because it's not always the user who'screating the reflinks or snapshots.


Um, I don't agree.

1) Actually, it is always the user who is creating reflinks, andsnapshots, too. Ultimately, it's always the user who does absolutelyeverything, because a computer is supposed to be under his fullcontrol. But, in the case of reflink-copies, this is even more truebecause reflinks are not an essential feature for normal OS operation,at least as far as today's OSes go. Every OS has to copy files around.Every OS requires the copy operation. No current OS requires thereflinked-copy operation in order to function.

2) A user can make any number of snapshots and subvolumes, but he canat any time select one subvolume as a focus of the defrag operation,and that subvolume can be perfectly defragmented without any unsharing(except that the internal-reflinked files won't be perfectlydefragmented).Therefore, the snapshoting operation can never jeopardize a perfectdefrag. The user can make many snapshots without any fears (I'd say atotal of 100 snapshots at any point in time is a good and reasonablelimit).

Such situations will occur only in some specific circumstances:
a) when the user is reflinking manually
b) when a file is copied from one subvolume into a different filein a different subvolume.
The situation a) is unusual in normal use of the filesystem. Evenwhen it occurs, it is the explicit command given by the user, so heshould be willing to accept all the consequences, even the bad oneslike imperfect defrag.
The situation b) is possible, but as far as I know copies arecurrently not done that way in btrfs. There should probably be theoption to reflink-copy files fron another subvolume, that would begood.
But anyway, it doesn't matter. Because most of the sharing will bebetween subvolumes, not within subvolume. So, if there is somein-subvolume sharing, the defrag wont be 100% perfect, that a minorpoint. Unimportant.

You're focusing too much on your own use case here.

It's so easy to say that. But you really don't know. You might bewrong. I might be the objective one, and you might be giving me somegroupthink-induced, badly thought out conclusions from years ago,which was never rechecked because that's so hard to do. And theneverybody just repeats it and it becomes the truth. As Goebels said,if you repeat anything enough times, it becomes the truth.

Not everybody uses snapshots, and there are many people who areusing reflinks very actively within subvolumes, either fordeduplication or because it saves time and space when dealing withmultiple copies of mostly identical tress of files.

Yes, I guess there are many such users. Doesn't matter. What you areproposing is that the defrag should break all their reflinks anddeduplicated data they painstakingly created. Come on!

Or, maybe the defrag should unshare to gain performance? Yes, but onlyWHEN USER REQUESTS IT. So the defrag can unshare,but only by request. Since this means that user is reversing hisprevious command to not unshare, this has to be explicitly requestedby the user, not part of the default defrag operation.

As mentioned in the previous email, we actually did have a (mostly)working reflink-aware defrag a few years back. It got removedbecause it had serious performance issues. Note that we're nottalking a few seconds of extra time to defrag a full tree here,we're talking double-digit _minutes_ of extra time to defrag amoderate sized (low triple digit GB) subvolume with dozens ofsnapshots, _if you were lucky_ (if you weren't, you would be lookingat potentially multiple _hours_ of runtime for the defrag). Theperformance scaled inversely proportionate to the number of reflinksinvolved and the total amount of data in the subvolume beingdefragmented, and was pretty bad even in the case of only a coupleof snapshots.
Ultimately, there are a couple of issues at play here:


I'll reply to this in another post. This one is getting a bit too long.

Re: Feature requests: online backup - defrag - change RAID level

Reply via email to