On 04/02/2024 06:24, Grant Edwards wrote:
On 2024-02-03, Wol <antli...@youngman.org.uk> wrote:
On 03/02/2024 16:02, Grant Edwards wrote:
rsnapshot is an application that uses rsync to do
hourly/daily/weekly/monthly (user-configurable) backups of selected
directory trees. It's done using rsync to create snapshots. They are
in-effect "incremental" backups, because the snapshots themselves are
effectively "copy-on-write" via clever use of hard-links by rsync. A
year's worth of backups for me is 7 daily + 4 weekly + 12 monthly
snapshots for a total of 23 snapshots.  If nothing has changed during
the year, those 23 snapshots take up the same amount of space as a
single snapshot.

So as I understand it, it looks like you first do a "cp with hardlinks"
creating a complete new directory structure, but all the files are
hardlinks so you're not using THAT MUCH space for your new image?

No, the first snaphost is a complete copy of all files.  The snapshots
are on a different disk, in a different filesystem, and they're just
plain directory trees that you can brose with normal filesystem
tools. It's not possible to hard-link between the "live" filesystem
and the backup snapshots. The hard-links are to inodes "shared"
between different snapshot directory trees. The first snapshot copies
everything to the backup drive (using rsync).

Yes I get that. You create a new partition and copy all your files into it.

I create a new pv (physical volume), lv (logical volume), and copy all my files into it.

The next snapshot creates a second directory tree with all unchanged
files hard-linked to the files that were copied as part of the first
snapshot. Any changed files just-plain-copied into the second snapshot
directory tree.

You create a complete new directory structure, which uses at least one block per directory. You can't hard link directories.

I create a LVM snapshot. Dunno how much that is - a couple of blocks?

You copy all the files that have changed, leaving the old copy in the old tree and the new copy in the new tree - for a 10MB file that's changed, you use 10MB.

I use rsync's "Overwrite in place" mode, so if I change 10 bytes at the end of that 10MB file I use ONE block to overwrite it (unless sod strikes). The old block is left in the old volume, the new block is left in the new volume.

The third snapshot does the same thing (starting with the second
snapshot directory tree).

So you end up with multiple directory trees (which could be large in themselves), and multiple copies of files that have changed. Which could be huge files.

I end up with ONE copy of my current data, and a whole bunch of dated mount points, each of which is a full copy as of that date, but only actually uses enough space to store a diff of the volume - if I change that 10MB file every backup, but only change lets say 10KB over three 4KB disk blocks, I've only used four blocks - 16KB - per backup!

Rinse and repeat.

Old snapshots trees are simply removed a-la 'rm -rf" when they're no
longer wanted.

So each snapshot is using the space required by the directory
structure, plus the space required by any changed files.

Sort of. The backup filesystem has to contain one copy of every file
so that there's something to hard-link to. The backup is completely
stand-alone, so it doesn't make sense to talk about all of the
snapshots containing only deltas. When you get to the "oldest"
snapshot, there's nothing to delta "from".

I get that - it's a different hard drive.

[...]

And that is why I like "ext over lvm copying with rsync" as my
strategy (not that I actually do it). You have lvm on your backup
disk. When you do a backup you do "rsync with overwrite in place",
which means rsync only writes blocks which have changed. You then
take an lvm snapshot which uses almost no space whatsoever.

So to compare "lvm plus overwrite in place" to "rsnapshot", my
strategy uses the space for an lvm header and a copy of all blocks
that have changed.

Your strategy takes a copy of the entire directory structure, plus a
complete copy of every file that has changed. That's a LOT more.

I don't understand, are you saying that somehow your backup doesn't
contain a copy of every file?

YES! Let's make it clear though, we're talking about EVERY VERSION of every backed up file.

And you need to get your head round the fact I'm not - actually - backing up my filesystem. I'm actually snapshoting my disk volume, my disk partition if you like.

Your strategy contains a copy of every file in your original backup, a full copy of the file structure for every snapshot, and a full copy of every version of every file that's been changed.

My version contains a complete copy of the current backup and (thanks to the magic of lvm) a block level diff of every snapshot, which appears to the system as a complete backup, despite taking up much less space than your typical incremental backup.

To change analogies completely - think git. My lvm snapshot is like a git commit. Git only stores the current HEAD, and retrieves previous commits by applying diffs. If I "check out a backup" (ie mount a backup volume), lvm applies a diff to the live filesystem.

Cheers,
Wol


Reply via email to