Hi list, I could probably quote roughly 1GB from this discussion, or top-post and append the whole thread for those of you who want to read it again, but I won't.
I just want to share some thoughts that seem to be missing from this discussion so far - for whatever use anyone can make of them. * The BackupPC pool file system is, generally speaking, made up of file system blocks. No logical entity within the file system will be shifted forward or backward by any amount of space that is *not* an integral multiple of the file system block size - and probably not even that. (There may be exceptions with things such as reiserfs tails, but I doubt they're worth taking into account. Hmm, there are also directory entries - are they worth thinking about?) * rsync calculates *rolling* block *checksums* in order to re-match data at an offset any number of bytes away. While "rolling" does not hurt (much - a bit of performance at the most) when applied to a whole file system, it provides no benefit. "checksums" may hurt, because there are bound to be collisions which would, if I'm not mistaken, cause a second pass across the "file(s)" to be done. That involves a *lot* of disk I/O if not bandwidth. The md5sums over individual large (non-rolling) blocks approach someone mentioned is bound to make much more sense for a file system than the rsync algorithm. * *Data within the pool* is *never* modified. Files are created, linked to, and deleted. That's it. [Wait, that's wrong. rsync checksums may be added later, but they're *appended*, aren't they? Only once anyway.] A few *small* files are modified (log files, backups files; perhaps a .bash_history file or other things outside the scope of BackupPC). *Existing directories* are modified as new pool files are added and expired ones removed. The same applies to pc/$host directories regarding new and expired backups. *New pc/$host/$num directory hierarchies* are created. *Inode information* is modified heavily. Forget about the ctime, which may vary between file systems. The *link count* is modified, meaning the inode is modified. Not for every file in the pool, but for every file that was linked to (or unlinked from because of an expiring backup). Other metadata such as *block usage bitmaps* is modified. To sum it up, modifications since the last "backup" of the pool FS will consist of *new files and directories* and *changed file system metadata*. Presumably, your backups will consist mostly of *data* (you might want to check that), and a large part of that will be static. 300 GB of daily changes on a 540 GB pool seems extremely unlikely, 8 GB seems more like it. * VMDK files, from my experience, do not seem to resemble raw disk images too closely. I only use the non-preallocated variant, and this seems to be well optimized for storing wide-spread writes (think "mkfs") in a small amount of data. A preallocated VMDK may be a completely different matter. But it's a proprietary format, isn't it? Is there a public spec? Do you know what design decisions were made and why? In any case, how much data do you need to fully represent the changes made to the virtual file system? Does the VMDK change more or less than the file system it represents? By what factor? Is a VMDK also a logical block array, or may information shift by non-blocksize distances? * You could probably use DRBD or NBD to mirror to a partition inside a VMware guest, presuming you really want to do that. All of that said, I find the approach of incrementally copying the block device quite appealing, presuming it proves to work well (and I'm not yet convinced that rsync is the optimal tool to copy it with). It simply avoids some of the problems of a file-based approach, but it also has other drawbacks, meaning it won't work for everyone (e.g. you can't change the FS type; you need storage for the full device size and bandwidth for the initial transfer; you may need bandwidth for a full transfer on *restore*; you'll need enough space for the image on restore, even if only a fraction of the FS is in use; resizing the source FS may lead to a very long incremental transfer; you can't backup anything other than the *complete* FS the pool is on; it won't protect you from slowly accumulating FS corruption, as you're copying that into your backup; ...). I'm really interested in hearing about your experiences with this, but as, for me, <backuppc-users> is currently running in degraded read-mostly mode due to sheer volume, don't expect me to join the discussion on a regular basis :). Regards, Holger ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ BackupPC-users mailing list [email protected] List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
