Re: [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?)

Christoph Anton Mitterer Thu, 17 Dec 2015 16:22:26 -0800

[I'm combining the messages again, since I feel a bit bad, when I write
so many mails to the list ;) ]
But from my side, feel free to split up as much as you want (perhaps
not single characters or so ;) )



On Thu, 2015-12-17 at 04:06 +0000, Duncan wrote:
> Just to mention here, that I said "integrity management features",
> which 
> includes more than checksumming.  As Austin Hemmelgarn has been
> pointing 
> out, DBs and some VMs do COW, some DBs do checksumming or at least
> have 
> that option, and both VMs and DBs generally do at least some level
> of 
> consistency checking as they load.  Those are all "integrity
> management 
> features" at some level.
Okay... well, but the point of that whole thread was obviously data
integrity protection in the sense of what data checksumming does in
btrfs for CoWed data and for meta-data.
In other words: checksums at some blockleve, which are verified upon
every read.



> As for bittorrent, I /think/ the checksums are in the torrent files 
> themselves (and if I'm not mistaken, much as git, the chunks within
> the 
> file are actually IDed by checksum, not specific position, so as long
> as 
> the torrent is active, uploading or downloading, these will by
> definition 
> be retained).  As long as those are retained, the checksums should
> be 
> retained.  And ideally, people will continue to torrent the files
> long 
> after they've finished downloading them, in which case they'll still
> need 
> the torrent files themselves, along with the checksums info.
Well I guess we don't need to hook up ourselves so much on the p2p
formats.
They're just one examples, even if these would actually be integrity
protected in the sense as described above, well, fine, but there are
other major use cases left, for which this is not the case.

Of course one can also always argue, that users can then manually move
the files out of the no-CoWed area or manually create their own
checksums as I do and store them in XATTRS.
But all this is not real proper full checksum protection: there are
gaps, where things are not protected and normal users may simply not
do/know all this (and why shouldn't they still benefit from proper
checksumming if we can make it for them).
IMHO, even the argument that one could manually make checksums or move
the file to CoWed area, while the e.g. downloaded files are still in
cache doesn't count: that wouldn't work for VMs, DBs, and certainly not
for torrent files larger than the memory.


> Meanwhile, if they do it correctly there's no window without
> protection, 
> as the torrent file can be used to double-verify the file once moved,
> as 
> well, before deleting it.
Again, would work only for torrent-like files, not for VM images, only
partially for DBs... plus... why requiring users to make it manually,
if the fs could take care of it.







On Thu, 2015-12-17 at 05:07 +0000, Duncan wrote:
> > In kinda curios, what free space fragmentation actually means here.
> > 
> > Ist simply like this:
> > +----------+-----+---+--------+
> > >     F    |  D  | F |    D   |
> > +----------+-----+---+--------+
> > Where D is data (i.e. files/metadata) and F is free space.
> > In other words, (F)ree space itself is not further subdivided and
> > only
> > fragmented by the (D)ata extents in between.
> > 
> > Or is it more complex like this:
> > +-----+----+-----+---+--------+
> > >  F  |  F |  D  | F |    D   |
> > +-----+----+-----+---+--------+
> > Where the (F)ree space itself is subdivided into "extents" (not
> > necessarily of the same size), and btrfs couldn't use e.g. the
> > first two
> > F's as one contiguous amount of free space for a larger (D)ata
> > extent
> At the one level, I had the simpler f/d/f/d scheme in mind, but that 
> would be the case inside a single data chunk.  At the higher file
> level, 
> with files significant fractions of the size of a single data chunk
> to 
> much larger than a single data chunk, the more complex and second
> f/f/d/f/d case would apply, with the chunk boundary as the
> separation 
> between the f/f.
Okay, but that's only when there are data chunks that neighbour each
other... since the data chunks are rather big normally (1GB) that
shouldn't be such a big issue,... so I guess the real world looks like
this:
 DC#1                  DC#2
...----+---------------------------------...
...---+|+----------+-----+-
--+--------+
... F |||     F    |  D  | F |    D   |
...---+|+--------
--+-----+---+--------+
...----++---------------------------------...
(with DC = data chunk)

but it could NOT look like this:
 DC#1                  DC#2
...----+---------------------------------...
...---+|+-----+----+-----+---+--------+
... F |||  F  |  F |  D  | F
|    D   |
...---+|+-----+----+-----+---+--------+
...----++-------------
--------------------...
in other words, there could be =2 adjacent free
space "extents", when these are actually parts of different
neighbouring chunks, but there could NOT be >=2 adjacent free space
"extents" as part of the same data chunk.
Right?


 
> IOW, files larger than data chunk size will always be fragmented
> into 
> data chunk size fragments/extents, at the largest, because chunks
> are 
> designed to be movable using balance, device remove, replace, etc.
IOW, filefrag doesn't really show me directly, whether a file is
fragged or not (at least not, when the file is > chunk size)...
There should be a better tool for that from the btrfs :)

And one more (think I found parts of the answer already below):
Does defrag only try to defrag within chunks, or would it also try to
align datachunks the "belong" together next to each other - or better
said would it try to place extents beloning together in neighbouring
extents?
Or is it basically not really forseen in btrfs, that file sizes > chunk
size are really fully consecutively on disk?
Similar perhaps, whether freshly allocating a file larger than > chunk
size, would try to choose the (already existing) and allocate new
chunks so that its extents are contiguous even at chunk borders?

I think if files > chunk size, would be always fragmented at the chunk
level,.. this may show up a problematic edge case:
If a file is heavily accessed at regions that are at the chunk borders,
one would have always seeks (at HDDs) when the next chunk is actually
needed... and one could never defrag it fully, or at least any balance
could "destroy" it again.
I guess nodatacow'ed areas also use the 1GB chunk size, right?


> Using the 1 GiB nominal figure, files over 1 GiB would always be
> broken 
> into 1 GiB maximum size extents, corresponding to 1 extent per chunk.
I see...

 
> But based on real reports posting before and after numbers from
> filefrag 
> (on uncompressed btrfs), we do have cases where defrag can't find 256
> KiB 
> free-space blocks and thus can actually fragment a file worse than it
> was 
> before, so free-space fragmentation is indeed a very real problem.
btw: That's IMHO quite strange... or rather said: I'd have thought that
the check whether an extent get's even more fragmented than before,
would have been rather trivial...







On Thu, 2015-12-17 at 06:00 +0000, Duncan wrote:
> but as has been discussed elsewhere, on btrfs
> compressed 
> files it will interpret each 128 KiB btrfs compression block as its
> own 
> extent, even if (as seen in verbose mode) the next one begins where
> the 
> previous one ends so it's really just a single extent.
Hmm I took the opportunity and reported that as a wishlist bug
upstream:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=808265


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?)

Reply via email to