This is awesomely cool. From: tom stuart <[EMAIL PROTECTED]> To: Greg Roelofs <[EMAIL PROTECTED]> Cc: (censored) Subject: (censored) Message-ID: <[EMAIL PROTECTED]> References: <[EMAIL PROTECTED]> Date: Mon, 25 Feb 2002 09:34:42 +0000
Greg Roelofs ([EMAIL PROTECTED]) wrote: > Joshua Schachter wrote: > > [...] cp -al and rsync incremental backup trick > I was going to ask privately, but it occurs to me that others might be > interested in the answer: is this trick documented somewhere? i'm not sure whether you're asking about rsync in general or cp -al + rsync specifically, so i'll touch both bases. i've no idea of how familiar you are with all this, so please forgive any under- or overcomplexity. rsync [1] is generally very useful [2] for backups over a network, since its algorithm [3] provides a clever way of generating diffs between a local and a remote version of a file without needing to transfer the whole file. ordinarily rsync allows you to keep two separate collections of the same set of files synchronised, as its name suggests -- and that's normally good enough if all you want is to keep a backup of the current state of some files -- but there's a trick you can do with cp -al so that you actually accumulate a set of incremental backups, storing only the changes to your files on an arbitrarily regular basis (daily, weekly, monthly, whatever), which requires much less disk space than regularly storing new, complete, duplicate backups of the same (usually mostly unchanged) files. i couldn't find anything resembling "documentation" for it, but the gist is that cp -al creates a hard-linked [4] copy of a tree of files, rather than taking a real copy (so that, when you "cp -al /var/backup/2002-02-24 /var/backup/2002-02-25", /var/backup/2002-02-25 is full of hard links to yesterday's files), and then rsyncing that tree of links will make rsync break the links to *only the files that have changed* when it updates the directory. that's probably unclear. but for example, you might choose to: cp -al /var/backup/2002-02-24 /var/backup/2002-02-25 rsync --archive --delete rsync://user@host/dir /var/backup/2002-02-25 if only a few files have changed between yesterday and today, then /var/backup/2002-02-25 will contain *mostly* just hard links to the unchanged files in /var/backup/2002-02-24, but the few files that've changed will have had their hard links removed from /var/backup/2002-02-25 and replaced with a real new file. the practical upshot of this is that every new backup directory you make will only consume the hard drive space required for whatever files have changed since the last backup, rather than eating up a whole new chunk of disk space equivalent to the size of the entire backup every time. (you can imagine that 2002-02-26's backup will consist mostly of hard links to the files in 2002-02-25, which in turn are mostly hard links back to 2002-02-24, which in turn are...) it's really useful to have backups in this form, because if you desperately need that version of your home page from three weeks ago, just flip to the right backup directory and there it is. this is all straight from hazy memory, so i've probably committed at least one glaring error, omission or blatant untruth; can someone correct me? cheers, -t [1] http://rsync.samba.org/ [2] http://www.ccp14.ac.uk/ccp14admin/rsync/ [3] http://rsync.samba.org/rsync/tech_report/ [4] http://www.gnu.org/manual/cfengine-1.6.3/html_node/cfengine-Reference_59.html ____________________________________________ tom stuart http://obsess.com/ [EMAIL PROTECTED] -- <[EMAIL PROTECTED]> Kragen Sitaker <http://www.pobox.com/~kragen/> The Internet stock bubble didn't burst on 1999-11-08. Hurrah! <URL:http://www.pobox.com/~kragen/bubble.html> The power didn't go out on 2000-01-01 either. :)
