On Mon, Apr 13, 2009 at 02:20:34PM -0700, Rogan Creswick wrote: > I started using Dirvish to manage backups of a couple important > directories at work in early March, and I just realized that the > nightly backups are taking up an enormous amount of space (about 30x > more than expected ;).
You should understand that dirvish uses hard links to "duplicate" copies of files that are identical in time. Files that make small changes over time are not identical, and this includes accumulating log files, databases, git repositories, etc. Change one byte in the binary glob that represents a git repository, and it is a "new" file. Yes, there are tools like rdiff-backup that compute and store the differences between files. This uses less room. But before using any backup tool, contemplate the end goal. The goal is /not/ to make /backups/, but to make occasional /restores/. And what are you restoring, under what conditions? Sometimes it is a damaged file, sometimes it is hacker damage, sometime it is a zapped disk. Often, you want to get at the version of a file made three weeks ago, not last night's version. Sometimes, the same problem that damaged your original can damaged your backup copy (if they are both mounted and online). In all these cases, every restore is /unplanned/, and occurs at times when you are busy and haven't set aside time to recover a file or rebuild a disk. The major advantage of a tool like dirvish is that the images are complete file systems, ready to copy onto a hard drive. The backups are searchable and executable file systems. And if you manage your data correctly, you can get a heck of a lot of images onto a large hard drive ( 1500GB hard drives cost less than $120 on sale). But backups will have special cases, usually involving binary blobs like vmware images and databases. Often those already have "backup nature", the current copy already contains the old copies, so you don't need more than one copy (plus a few redundant copies). For that you can use the dirvish "expire" feature, aggressively, in a different vault. I have a "binary glob" directory on my system disk with soft links to these big files, and exclude them from the main backups. That glob directory is stored in a separate vault, and expired after a few days. With tools like rdiff-backup, you get the daily copies, but the files are not stored as a file system and not executable, so they are typically restored manually one by one. This is an option for people with inconsequential lives, who aren't delaying anything important by spending an occasional day or two rebuilding hard drives. Personally, I go for big drives and appropriate backup strategies, so that restores can be done with a few lines of commands, possibly restoring vaults from a mix of times. After a belatedly discovered system compromise, or more likely a botched update, I can restore yesterday's user files and last month's system binaries. IMHO, no backup system is designed correctly, as a pre-restore system. Whatever way you go, there will be inconveniences and compromises, because these systems are designed to packrat data, not recover the right data to the right places, in a hurry. Whatever backup system you choose, you should test your restore procedure onto a blank hard drive (which you should have handy, just in case), and make your decisions based on how easily that goes. Otherwise, backups are just a religious ritual, not a prudent preparation for future restores. Keith -- Keith Lofstrom kei...@keithl.com Voice (503)-520-1993 KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon" Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs _______________________________________________ PLUG mailing list PLUG@lists.pdxlinux.org http://lists.pdxlinux.org/mailman/listinfo/plug