Craig Barratt wrote at about 00:31:58 -0800 on Wednesday, March 2, 2011:
 > Jeffrey suggested I outline some of the features in 4.0 to solicit
 > feedback and discussion.  I apologize for the delay in doing this.
 > I've been been exceptionally busy during the last few months.

Thanks Craig -- this is awesome (I assume I am the Jeffrey you
reference LOL). I hope you don't mind my adding my comments and
suggestions though they will certainly be less well thought through
than the months of effort you have put into this to date...

 > In 4.0, backup storage is quite different. The most recent backup
 > is always filled, and the prior backups are stored as reverse time
 > deltas. There could be no connection between a full/incremental and
 > backups being filled or not.

 > There is no longer a concept of $Conf{IncrLevels}. It is possible
 > to have earlier backups filled to reduce the number of merges
 > required to reconstruct an old backup. Also, there is no need to
 > store full directory trees on the deltas - only filled backups
 > store a full tree.

Does this mean that the notion of incremental vs. full backup goes
away and instead we just talk about filled vs. not-filled backups?

This would make sense since already in 3.x when using rsync
particularly with checksum caching, the difference between full
vs. incremental was getting blurred and was less stark than
traditional notions since even in fulls if a file didn't change
nothing adds to the pool and nothing is even transferred if the rsync
digest is cached.
 
If so, does that mean that there is now a concept of how many backups
between fills that replaces the old incremental level concept?

And if so, is the choice of how many backups between fills really just
boil down to a tradeoff between storage efficiency (due to duplication
of directory trees and attrib files) vs. speed of reconstructing
intermediate backups?


 > A backup starts by simply renaming the most recent backup, eg,
 > $TopDir/pc/HOST/15 to $TopDir/pc/HOST/16, and an initially empty
 > tree is created below $TopDir/pc/HOST/15.  The backup proceeds by
 > using $TopDir/pc/HOST/16 as the reference (in the case of rsync),
 > and each time there is a change, $TopDir/pc/HOST/16 is updated,
 > and the opposite change is made below $TopDir/pc/HOST/15.  This
 > is a big improvement over 3.x since very few disk writes are
 > needed if the client data hasn't changed very much (currently 3.x
 > creates a full tree of hardlinks when you do a full backup even
 > with no changes).

This is awesome! Certainly should be a big speedup given that disk IO
is often the primary bottleneck.

 > This approach changes the deletion dependencies.  The oldest backup can
 > be deleted at any time, and more generally the oldest backup of a chain
 > (ie: if the next older one is filled) can be deleted at any time.  Any
 > other backup can be deleted too, but it requires the deltas to be merged
 > with the next older backup.  A filled backup can be deleted too, and it
 > will be merged to create a new filled backup with the prior deltas.

Am I write in assuming that this means you are exposing perl library
routines (and maybe even full scripts) that:
1. Allow one to manually convert a non-filled to a filled backup
2. Delete any backup (whether filled or unfilled) and automatically
   fill the prior backups so that the integrity of the chain is
   preserved.
3. Conversely, is there a routine that would "unfill" a filled backup
   by converting it to a delta relative to the next most recent
   backup? (this should be possible and could be useful in some cases)


 > Everything described above is already implemented.
 > 
 > The one open issue is when and how an intermediate filled backup
 > is created.  One approach is to continue to connect the concept
 > of a full backup and a filled backup (although the design allows
 > them to be decoupled).

I'm really struggling to understand what benefit there is anymore to
the notion of a "full" backup and whether it just adds more confusion
to have both a full vs. incremental and a filled vs. unfilled concept.
 > 
 > The code continues to support the 3.x storage format, so you can
 > upgrade to 4.x and still access/view/restore the 3.x backups.
 > However, the first backup after upgrading to 4.x will need to be
 > a full to establish the first filled reference backup.

As per my earlier email, is there code to do a one-time forward
conversion of 3.x backups to 4.x backups? The goal of course would be
to get rid of the hardlinks while also benefiting from the more robust
md5sum checksuming of 4.x?

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
BackupPC-devel mailing list
BackupPC-devel@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-devel
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to