Hi Peter, > http://blogs.techrepublic.com.com/10things/?p=895
It's odd they mention rsync but not one of the programs that kind of sit above it, e.g. http://rdiff-backup.nongnu.org/features.html > which do you use? GNU tar. I figure it's a simple file format I can interpret almost anywhere I'm trying to recover data. tar's -g option lets me do a full backup, creating a big tar file, on the first day and then incremental ones recording just what's changed since the last backup on subsequent days. If a file changes, tar stores the whole of the new one, not a diff, so if there's some problem with an earlier tar file there's no "chain" of edits that's been broken. This causes issues if a file just keeps growing though, e.g. xchat(1) IRC logs, but the backup script ensures xchat isn't running before moving large log files to an archive name so the tar file never gets to contain huge ones every day. I compress the tar files with bzip2 and encrypt them with gpg(1). Encryption would compress anyway but it's nice that if decrypted, they don't balloon in size when I may not have enough space. They're then copied from that first drive, which is the one that also has the data that's being backed up, onto a second internal one that's normally unmounted and spun down and is just there to store backups. And also onto a 8GB USB flash drive that's normally stored away from the PC -- both are unlikely to get nicked -- and that I can easily take with me if I stay away from home. Oh, and I upload them to some space on a friend's geographically distant machine over ssh(1) so the house can burn down. If all of those get trashed I've probably bigger problems to worry about, like finding a shotgun, canned food, and a place to defend. :-) When I say "copy" I really mean rsync, even within the one machine. By using its -c/--checksum option I can make it re-read the copies it made previously, every byte, and ensure that they still match the originals. Else rsync would just rely on the file size and modification time. This "scrubs" the copies, allowing the media to spot any developing problems, e.g. a hard drive may reallocate a sector that it only read successfully after a few tries. It avoids that "my backups are unreadable" moment when you least want it. Over time, the "everything" level-0 tar file plus all the incremental level-1s start getting too big. I then put them in an ISO, along with the output of sha1sum(1), and burn three copies to CD. One goes in the post to somewhere else. The other two stay with me. One of those I take with me if I'm going away somewhere for a while. The sha1sum output means that I can easily test the contents of the CD with `sha1sum -c'. How long these will remain readable is unknown, but I'm using branded CDs and the latest backups are on other media. The three identical copies of the ISO means I should be able to recover a complete one if needed. In time, I'll probably move these to something else if it seems a better bet. I probably make use of the backups once every couple of months. Either to get back a deleted file where I've changed my mind, "RAID is not backup", or more often to double-check when something changed. So I'm confident I can use them to recover if, say, the whole drive fails. Some things don't fall into this system, e.g. an ever growing set of photos. To avoid making the tar files too big, I just maintain copies of these, again using `rsync --checksum' and to all the previous destinations. This loses the "time machine" aspect but with photos that's not a problem. I rarely delete them, and when I do I make sure I'm very happy before the copies are also deleted. When I run the backup script, normally every day, it takes two or three minutes. I go and make a cuppa. At the end I carefully check its brief output which includes the biggest files that were backed up today. This lets me spot when something new, big, and unwanted comes along, like a newly installed music player that tracks in XML the start and stop of everything ever played. I don't have backups run automatically. I prefer to delete all the spam, read and delete mailing list stuff, etc., before backing up what's left. It part of my "reading email first thing" routine so remembering to do it isn't a problem. Cheers, Ralph. -- Next meeting: Bournemouth, Wednesday 2009-08-05 20:00 Dorset LUG: http://dorset.lug.org.uk/ Chat: http://www.mibbit.com/?server=irc.blitzed.org&channel=%23dorset List info: https://mailman.lug.org.uk/mailman/listinfo/dorset