Re: [gentoo-user] Re: Suggestions for backup scheme?

Wols Lists Sun, 04 Feb 2024 01:59:33 -0800

On 04/02/2024 06:24, Grant Edwards wrote:

On 2024-02-03, Wol <[email protected]> wrote:

On 03/02/2024 16:02, Grant Edwards wrote:

rsnapshot is an application that uses rsync to do
hourly/daily/weekly/monthly (user-configurable) backups of selected
directory trees. It's done using rsync to create snapshots. They are
in-effect "incremental" backups, because the snapshots themselves are
effectively "copy-on-write" via clever use of hard-links by rsync. A
year's worth of backups for me is 7 daily + 4 weekly + 12 monthly
snapshots for a total of 23 snapshots.  If nothing has changed during
the year, those 23 snapshots take up the same amount of space as a
single snapshot.


So as I understand it, it looks like you first do a "cp with hardlinks"
creating a complete new directory structure, but all the files are
hardlinks so you're not using THAT MUCH space for your new image?


No, the first snaphost is a complete copy of all files.  The snapshots
are on a different disk, in a different filesystem, and they're just
plain directory trees that you can brose with normal filesystem
tools. It's not possible to hard-link between the "live" filesystem
and the backup snapshots. The hard-links are to inodes "shared"
between different snapshot directory trees. The first snapshot copies
everything to the backup drive (using rsync).


Yes I get that. You create a new partition and copy all your files into it.

I create a new pv (physical volume), lv (logical volume), and copy allmy files into it.


The next snapshot creates a second directory tree with all unchanged
files hard-linked to the files that were copied as part of the first
snapshot. Any changed files just-plain-copied into the second snapshot
directory tree.

You create a complete new directory structure, which uses at least oneblock per directory. You can't hard link directories.


I create a LVM snapshot. Dunno how much that is - a couple of blocks?

You copy all the files that have changed, leaving the old copy in theold tree and the new copy in the new tree - for a 10MB file that'schanged, you use 10MB.

I use rsync's "Overwrite in place" mode, so if I change 10 bytes at theend of that 10MB file I use ONE block to overwrite it (unless sodstrikes). The old block is left in the old volume, the new block is leftin the new volume.


The third snapshot does the same thing (starting with the second
snapshot directory tree).

So you end up with multiple directory trees (which could be large inthemselves), and multiple copies of files that have changed. Which couldbe huge files.

I end up with ONE copy of my current data, and a whole bunch of datedmount points, each of which is a full copy as of that date, but onlyactually uses enough space to store a diff of the volume - if I changethat 10MB file every backup, but only change lets say 10KB over three4KB disk blocks, I've only used four blocks - 16KB - per backup!


Rinse and repeat.

Old snapshots trees are simply removed a-la 'rm -rf" when they're no
longer wanted.

So each snapshot is using the space required by the directory
structure, plus the space required by any changed files.


Sort of. The backup filesystem has to contain one copy of every file
so that there's something to hard-link to. The backup is completely
stand-alone, so it doesn't make sense to talk about all of the
snapshots containing only deltas. When you get to the "oldest"
snapshot, there's nothing to delta "from".


I get that - it's a different hard drive.

[...]

And that is why I like "ext over lvm copying with rsync" as my
strategy (not that I actually do it). You have lvm on your backup
disk. When you do a backup you do "rsync with overwrite in place",
which means rsync only writes blocks which have changed. You then
take an lvm snapshot which uses almost no space whatsoever.

So to compare "lvm plus overwrite in place" to "rsnapshot", my
strategy uses the space for an lvm header and a copy of all blocks
that have changed.

Your strategy takes a copy of the entire directory structure, plus a
complete copy of every file that has changed. That's a LOT more.


I don't understand, are you saying that somehow your backup doesn't
contain a copy of every file?

YES! Let's make it clear though, we're talking about EVERY VERSION ofevery backed up file.

And you need to get your head round the fact I'm not - actually -backing up my filesystem. I'm actually snapshoting my disk volume, mydisk partition if you like.

Your strategy contains a copy of every file in your original backup, afull copy of the file structure for every snapshot, and a full copy ofevery version of every file that's been changed.

My version contains a complete copy of the current backup and (thanks tothe magic of lvm) a block level diff of every snapshot, which appears tothe system as a complete backup, despite taking up much less space thanyour typical incremental backup.

To change analogies completely - think git. My lvm snapshot is like agit commit. Git only stores the current HEAD, and retrieves previouscommits by applying diffs. If I "check out a backup" (ie mount a backupvolume), lvm applies a diff to the live filesystem.


Cheers,
Wol

Re: [gentoo-user] Re: Suggestions for backup scheme?

Reply via email to