Has anyone looked at EMC's Avamar product? It seems like it might do
what you want it to do. It specifically mentions virtual machines as well.

rgt

On 02/22/2010 10:59 AM, Dean Anderson wrote:
> 
>> .         50G sparse file VMWare virtual disk, contains Windows XP
>> installation, 22G used.
>>
>> .         Back it up once.  22G go across the network.  It takes 30 mins.
>>
>> .         Boot into XP, change a 1K file, shutdown.  Including random
>> registry changes and system event logs and other random changes, imagine
>> that a total of twenty 1k blocks have changed.
>>
>> .         Now do an incremental backup.  Sure, you may need to scan the file
>> looking for which blocks changed, but you can do that as fast as you can
>> read the whole file once, assuming you kept some sort of checksums from the
>> previous time.  And then just send 20k across the net.  This should complete
>> at least 5x faster than before ... which means at most 6 mins.
> 
> But there is no /backup/ technology to do that now, that I know of. A
> checksum on the whole file won't tell you what /block/ changed.  One
> would need checksums on /each/ block.  I don't know of any backup system
> that does that. The backup log would be a significant fraction of the
> filesystem, or if sparse, a significant fraction of the data. Lets say
> you have a 1k block and you want to monitor for changes, and use a
> 160byte mac to ensure no collisions on changes having the same sum. See
> the problem?  Not to mention the issue of computing the checksums during
> backup, looking them up in a database, which has its own overhead.  The
> backup system could become the major load on the server.
> 
> Of course, the versioning filesystem doesn't do it that way.  It just
> keeps pointers to a copy-on-write set of blocks that have changed,
> rather just like virtual memory, starting with the root inode (actually
> plural).  One only needs to compare the two root inodes to find what
> blocks have changed between them. At some gross over-simplification,
> just 'Lather rinse repeat' for the rest of the inodes in the filesystem.  
> You get the point.
> 
> It would indeed be _nice_ if only the 20k changed were sent, but there
> aren't many filesystems that /can/ indicate anything more than "the file
> changed since the last time it was backed up". Hence the backup program
> has to read/restore the entire file.  "Incremental backup" refers to the
> whole filesystem, not the the blocks of files.
> 
>> .  If you do this with tar or dump ... even with compression ... still
>> 22G goes across the net.  Another 30 minute backup.
>>
>> Is it clear now?
> 
> Indeed.  To do what you want (only send the 20k that changed), one needs
> a versioning filesystem to do that. Like AFS or NetApp's fs.  What you
> want to do is intimately related to the capability of the filesystem to
> track what blocks have changed. AFS, for example, keeps one version back
> as the 'backup fileset' and an AFS incremental backup takes only the
> blocks that are different from the backup fileset. NetApp keeps 10
> versions but I don't remember how the NetApp backup works. There are
> efforts in AFS to allow more versions.  I don't know of any other
> filesystems that keep version information. Ordinary filesystems (like
> FFS, EXT2,3, NTFS) don't keep track of what blocks have changed since
> the last backup.
> 
> But your point should be well taken to FS implementors: We need
> versioning filesystems.
> 
> 
>               --Dean
> 

_______________________________________________
bblisa mailing list
[email protected]
http://www.bblisa.org/mailman/listinfo/bblisa

Reply via email to