I had always been confused by how branches are used in the Dirvish arena, and the wiki article on dirvish.org didn't seem to be very clear for me. Recently, while pondering it (and digging through source code), it dawned on me how they work, and also a possible use case (for me at least). First, a quick summary:
A vault represents a collection of branches all residing on the same file systems. This is required so that hard links can exists between copies and keep the disk usage down to a minimum. Each branch shares the same directory structure in the vault, but has it's own independent configuration and history file. An branch contains one or more images, each image references an earlier image from the same branch. When running dirvish, it creates one new image for one branch in one vault. The branch default is assumed if none if specified. The vault must always be specified. When creating a new image, the --reference option can be specified to dirvish to specify an image from a different branch is to be used as the reference point. This is the one feature that differentiates using branches from vaults. If two branches don't share any images as a reference point, then it's no different than using two vaults. Now, for a different use case than is specified in the Wiki article. What if two branches point to the same filesystem, but just use a different configuration, perhaps a different exclude list. I might use the default branch to back up the whole filesystem, but use a different branch with an exclude list to exclude heavily modified, but non-critical areas of the filesystem, such as the directory tree used for automatic nightly builds. There's no need to keep each nights copy backed up so I will exclude it from the daily branch. The default branch can get run on Sunday, and for the other 6 days of the week I will run the daily branch. One issue remains with this approach, files that get modified which are in both the daily branch and default branch will fork into two copies as Dirvish only uses images from the same branch, by default, for hard links on new images. One solution to that might be using --reference to keep daily based on default, but I'd still end up with two copies if the new version of the file is first noticed by a daily backup and then later by the default backup. Then, by the next default backup, all dailies from then on will hard link to the copy from default instead of the original copy in the first daily. Since my daily backups will expire sooner than my weekly full default backup, those extra copies will eventually go away instead of staying around as they would if the two branches stayed independent history-wise. This brings up another point that I would like to eventually solve, those the issue is really with rsync, not Dirvish. When moving a large amount of data from one place to another, it will cause all that data to be duplicated on backups. It makes restructuring data on the server difficult if I don't have huge margins on my backup media. Ideally, rsync could track inode numbers and hard link to the original files at their old location to their new location, but alas, rsync does not. Other backup solutions, such as dump/restore track files solely by their metadata information looking at the ctime/mtime for updates and only backup files that have newer times than the last start of backup time. In the case of moving a single 4GB directory, only the file listing of the old parent directory and new parent directory need to be backed up which can be measured in kilobytes or less. This level of backup could be reasonably achieved with rsync if it maintained a mapping of source inode/device numbers to destination paths in the sync. Then, when rsync detects a new file on the source, it can look it up by inode/device to the old path on the destination and detect whether it can hard link or at least save bandwidth by using rolling checksums. -- Loren M. Lang [email protected] http://www.alzatex.com/ Public Key: ftp://ftp.tallye.com/pub/lorenl_pubkey.asc Fingerprint: 10A0 7AE2 DAF5 4780 888A 3FA4 DCEE BB39 7654 DE5B _______________________________________________ Dirvish mailing list [email protected] http://www.dirvish.org/mailman/listinfo/dirvish
