This is awesomely cool.

From: tom stuart <[EMAIL PROTECTED]>
To: Greg Roelofs <[EMAIL PROTECTED]>
Cc: (censored)
Subject: (censored)
Message-ID: <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>
Date: Mon, 25 Feb 2002 09:34:42 +0000

Greg Roelofs ([EMAIL PROTECTED]) wrote:
> Joshua Schachter wrote:
> > [...] cp -al and rsync incremental backup trick
> I was going to ask privately, but it occurs to me that others might be
> interested in the answer:  is this trick documented somewhere?

i'm not sure whether you're asking about rsync in general or cp -al +
rsync specifically, so i'll touch both bases. i've no idea of how
familiar you are with all this, so please forgive any under- or
overcomplexity.

rsync [1] is generally very useful [2] for backups over a network, since
its algorithm [3] provides a clever way of generating diffs between a
local and a remote version of a file without needing to transfer the
whole file.

ordinarily rsync allows you to keep two separate collections of the same
set of files synchronised, as its name suggests -- and that's normally
good enough if all you want is to keep a backup of the current state of
some files -- but there's a trick you can do with cp -al so that you
actually accumulate a set of incremental backups, storing only the
changes to your files on an arbitrarily regular basis (daily, weekly,
monthly, whatever), which requires much less disk space than regularly
storing new, complete, duplicate backups of the same (usually mostly
unchanged) files.

i couldn't find anything resembling "documentation" for it, but the gist
is that cp -al creates a hard-linked [4] copy of a tree of files, rather
than taking a real copy (so that, when you "cp -al /var/backup/2002-02-24
/var/backup/2002-02-25", /var/backup/2002-02-25 is full of hard links to
yesterday's files), and then rsyncing that tree of links will make rsync
break the links to *only the files that have changed* when it updates
the directory.

that's probably unclear. but for example, you might choose to:

 cp -al /var/backup/2002-02-24 /var/backup/2002-02-25
 rsync --archive --delete rsync://user@host/dir /var/backup/2002-02-25

if only a few files have changed between yesterday and today, then
/var/backup/2002-02-25 will contain *mostly* just hard links to the
unchanged files in /var/backup/2002-02-24, but the few files that've
changed will have had their hard links removed from
/var/backup/2002-02-25 and replaced with a real new file. the practical
upshot of this is that every new backup directory you make will only
consume the hard drive space required for whatever files have changed
since the last backup, rather than eating up a whole new chunk of disk
space equivalent to the size of the entire backup every time. (you can
imagine that 2002-02-26's backup will consist mostly of hard links to
the files in 2002-02-25, which in turn are mostly hard links back to
2002-02-24, which in turn are...)

it's really useful to have backups in this form, because if you
desperately need that version of your home page from three weeks ago,
just flip to the right backup directory and there it is.

this is all straight from hazy memory, so i've probably committed at
least one glaring error, omission or blatant untruth; can someone
correct me?

cheers,
-t

[1] http://rsync.samba.org/
[2] http://www.ccp14.ac.uk/ccp14admin/rsync/
[3] http://rsync.samba.org/rsync/tech_report/
[4] http://www.gnu.org/manual/cfengine-1.6.3/html_node/cfengine-Reference_59.html

____________________________________________
tom stuart http://obsess.com/ [EMAIL PROTECTED]

-- 
<[EMAIL PROTECTED]>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
The Internet stock bubble didn't burst on 1999-11-08.  Hurrah!
<URL:http://www.pobox.com/~kragen/bubble.html>
The power didn't go out on 2000-01-01 either.  :)

Reply via email to