Hi Peter,

> http://blogs.techrepublic.com.com/10things/?p=895

It's odd they mention rsync but not one of the programs that kind of sit
above it, e.g. http://rdiff-backup.nongnu.org/features.html

> which do you use?

GNU tar.  I figure it's a simple file format I can interpret almost
anywhere I'm trying to recover data.  tar's -g option lets me do a full
backup, creating a big tar file, on the first day and then incremental
ones recording just what's changed since the last backup on subsequent
days.  If a file changes, tar stores the whole of the new one, not a
diff, so if there's some problem with an earlier tar file there's no
"chain" of edits that's been broken.  This causes issues if a file just
keeps growing though, e.g. xchat(1) IRC logs, but the backup script
ensures xchat isn't running before moving large log files to an archive
name so the tar file never gets to contain huge ones every day.

I compress the tar files with bzip2 and encrypt them with gpg(1).
Encryption would compress anyway but it's nice that if decrypted, they
don't balloon in size when I may not have enough space.  They're then
copied from that first drive, which is the one that also has the data
that's being backed up, onto a second internal one that's normally
unmounted and spun down and is just there to store backups.  And also
onto a 8GB USB flash drive that's normally stored away from the PC --
both are unlikely to get nicked -- and that I can easily take with me if
I stay away from home.  Oh, and I upload them to some space on a
friend's geographically distant machine over ssh(1) so the house can
burn down.  If all of those get trashed I've probably bigger problems to
worry about, like finding a shotgun, canned food, and a place to defend.
:-)

When I say "copy" I really mean rsync, even within the one machine.  By
using its -c/--checksum option I can make it re-read the copies it made
previously, every byte, and ensure that they still match the originals.
Else rsync would just rely on the file size and modification time.  This
"scrubs" the copies, allowing the media to spot any developing problems,
e.g. a hard drive may reallocate a sector that it only read successfully
after a few tries.  It avoids that "my backups are unreadable" moment
when you least want it.

Over time, the "everything" level-0 tar file plus all the incremental
level-1s start getting too big.  I then put them in an ISO, along with
the output of sha1sum(1), and burn three copies to CD.  One goes in the
post to somewhere else.  The other two stay with me.  One of those I
take with me if I'm going away somewhere for a while.  The sha1sum
output means that I can easily test the contents of the CD with `sha1sum
-c'.  How long these will remain readable is unknown, but I'm using
branded CDs and the latest backups are on other media.  The three
identical copies of the ISO means I should be able to recover a complete
one if needed.  In time, I'll probably move these to something else if
it seems a better bet.

I probably make use of the backups once every couple of months.  Either
to get back a deleted file where I've changed my mind, "RAID is not
backup", or more often to double-check when something changed.  So I'm
confident I can use them to recover if, say, the whole drive fails.

Some things don't fall into this system, e.g. an ever growing set of
photos.  To avoid making the tar files too big, I just maintain copies
of these, again using `rsync --checksum' and to all the previous
destinations.  This loses the "time machine" aspect but with photos
that's not a problem.  I rarely delete them, and when I do I make sure
I'm very happy before the copies are also deleted.

When I run the backup script, normally every day, it takes two or three
minutes.  I go and make a cuppa.  At the end I carefully check its brief
output which includes the biggest files that were backed up today.  This
lets me spot when something new, big, and unwanted comes along, like a
newly installed music player that tracks in XML the start and stop of
everything ever played.  I don't have backups run automatically.  I
prefer to delete all the spam, read and delete mailing list stuff, etc.,
before backing up what's left.  It part of my "reading email first
thing" routine so remembering to do it isn't a problem.

Cheers,


Ralph.


-- 
Next meeting: Bournemouth, Wednesday 2009-08-05 20:00
Dorset LUG: http://dorset.lug.org.uk/
Chat: http://www.mibbit.com/?server=irc.blitzed.org&channel=%23dorset
List info: https://mailman.lug.org.uk/mailman/listinfo/dorset

Reply via email to