On 2014-12-13 21:54, Robert White wrote:

- rsync many remote data sources (-a -H --inplace --partial) + snapshot

Using --inplace on a Copy On Write filesystem has only one effect, it
increases fragmentation... a lot...

...if the file was changed.


Every new block is going to get
written to a new area anyway,

Exactly - "every new block". But that's true with and without --inplace.
Also - without --inplace, it is "every block". In other words, without --inplace, the file is likely to be rewritten by rsync to a new one, and CoW is lost (more below).


so if you have enough slack space to
keep the one new copy of the new file, which you will probably use up
anyway in the COW event, laying in the fresh copy in a likely more
contiguous way will tend to make things cleaner over time.

--inplace is doubly useless with compression as compression is
perturbed by default if one byte changes in the original file.

No. If you change 1 byte in a 100 MB file, or perhaps 1 GB file, you will likely loose a few kBs of CoW. The whole file is certainly not rewritten if you use --inplace. However it will be wholly rewritten if you don't use --inplace.


The only time --inplace might be helpful is if the file is NOCOW... except...

No, you're wrong.
By default, rsync creates a new file if it detects any file modification - like "touch file".

Consider this experiment:

# create a "large file"
dd if=/dev/urandom of=bigfile bs=1M count=3000

# copy it with rsync
rsync -a -v --progress bigfile bigfile2

# copy it again - blazing fast, no change
rsync -a -v --progress bigfile bigfile2

# "touch" the original file
touch bigfile

# try copying again with rsync - notice rsync creates a temp file, like .bigfile2.J79ta2
# No change to the file except the timestamp, but good bye your CoW.
rsync -a -v --progress bigfile bigfile2

# Now try the same with --inplace; compare data written to disk with iostat -m in both cases.


Same goes for append files - even if they are compressed, most CoW will be shared. I'd say it will be similar for lightly modified files (changed data will be CoW-unshared, some compressed "overhead" will be unshared, but the rest will be untouched / shared by CoW between the snapshots).



- around 500 snapshots in total, from 20 or so subvolumes

That's a lot of snapshots and subvolumes. Not an impossibly high
number, but a lot. That needs it's own use-case evaluation. But
regardless...

Even if you set the NOCOW option on a file to make the --inplace rsync
work, if that file is snapshotted (snapshot?) between the rsync
modification events it will be in 1COW mode because of the snapshot
anyway and you are back to the default anti-optimal conditions.

Again - if the file was changed a lot, it doesn't matter if it's --inplace or not. If the file data was not changed, or changed little - --inplace will help preserve CoW.


Especially rsync's --inplace option combined with many snapshots and
large fragmentation was deadly for btrfs - I was seeing system freezes
right when rsyncing a highly fragmented, large file.

You are kind of doing all that to yourself.

To clarify - freezes - I mean kernel bugs exposed and machine freezing.
I think we all agree that whatever userspace is doing in the filesystem, it should not result is kernel BUG / freeze.


Combining _forced_
compression with denying the natural opportunity for the re-write of
the file to move it to nicely contiguous "new locations" and then
pinning it all in place with multiple snapshots you've created the
worst of all possible worlds.

I disagree. It's quite compact, for my data usage. If I needed blazing fast file access, I wouldn't be using a CoW filesystem nor snapshots in the first place. For data mostly stored and rarely read, it is OK.


(...)

And keep repeating this to yourself :: "balance does not reorganize
anything, it just moves the existing disorder to a new location". This
is not a perfect summation, and it's clearly wrong if you are using
"convert", but it's the correct way to view what's happening while
asking yourself "should I balance?".

I agree - I don't run it unless I need to (or I'm curious to see if it would expose some more bugs). It would be quite a step back for a filesystem to need some periodic maintenance like that after all.

Also I'm in the opinion that balance should not cause the kernel to BUG - it should abort, possibly remount the fs ro etc. (suggest running btrfsck, if there is enough confidence in this tool), but definitely not BUG.


--
Tomasz Chmielewski
http://www.sslrack.com

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to