Has anyone looked at EMC's Avamar product? It seems like it might do what you want it to do. It specifically mentions virtual machines as well.
rgt On 02/22/2010 10:59 AM, Dean Anderson wrote: > >> . 50G sparse file VMWare virtual disk, contains Windows XP >> installation, 22G used. >> >> . Back it up once. 22G go across the network. It takes 30 mins. >> >> . Boot into XP, change a 1K file, shutdown. Including random >> registry changes and system event logs and other random changes, imagine >> that a total of twenty 1k blocks have changed. >> >> . Now do an incremental backup. Sure, you may need to scan the file >> looking for which blocks changed, but you can do that as fast as you can >> read the whole file once, assuming you kept some sort of checksums from the >> previous time. And then just send 20k across the net. This should complete >> at least 5x faster than before ... which means at most 6 mins. > > But there is no /backup/ technology to do that now, that I know of. A > checksum on the whole file won't tell you what /block/ changed. One > would need checksums on /each/ block. I don't know of any backup system > that does that. The backup log would be a significant fraction of the > filesystem, or if sparse, a significant fraction of the data. Lets say > you have a 1k block and you want to monitor for changes, and use a > 160byte mac to ensure no collisions on changes having the same sum. See > the problem? Not to mention the issue of computing the checksums during > backup, looking them up in a database, which has its own overhead. The > backup system could become the major load on the server. > > Of course, the versioning filesystem doesn't do it that way. It just > keeps pointers to a copy-on-write set of blocks that have changed, > rather just like virtual memory, starting with the root inode (actually > plural). One only needs to compare the two root inodes to find what > blocks have changed between them. At some gross over-simplification, > just 'Lather rinse repeat' for the rest of the inodes in the filesystem. > You get the point. > > It would indeed be _nice_ if only the 20k changed were sent, but there > aren't many filesystems that /can/ indicate anything more than "the file > changed since the last time it was backed up". Hence the backup program > has to read/restore the entire file. "Incremental backup" refers to the > whole filesystem, not the the blocks of files. > >> . If you do this with tar or dump ... even with compression ... still >> 22G goes across the net. Another 30 minute backup. >> >> Is it clear now? > > Indeed. To do what you want (only send the 20k that changed), one needs > a versioning filesystem to do that. Like AFS or NetApp's fs. What you > want to do is intimately related to the capability of the filesystem to > track what blocks have changed. AFS, for example, keeps one version back > as the 'backup fileset' and an AFS incremental backup takes only the > blocks that are different from the backup fileset. NetApp keeps 10 > versions but I don't remember how the NetApp backup works. There are > efforts in AFS to allow more versions. I don't know of any other > filesystems that keep version information. Ordinary filesystems (like > FFS, EXT2,3, NTFS) don't keep track of what blocks have changed since > the last backup. > > But your point should be well taken to FS implementors: We need > versioning filesystems. > > > --Dean > _______________________________________________ bblisa mailing list [email protected] http://www.bblisa.org/mailman/listinfo/bblisa
