Craig Barratt wrote at about 00:31:58 -0800 on Wednesday, March 2, 2011: > Jeffrey suggested I outline some of the features in 4.0 to solicit > feedback and discussion. I apologize for the delay in doing this. > I've been been exceptionally busy during the last few months.
Thanks Craig -- this is awesome (I assume I am the Jeffrey you reference LOL). I hope you don't mind my adding my comments and suggestions though they will certainly be less well thought through than the months of effort you have put into this to date... > In 4.0, backup storage is quite different. The most recent backup > is always filled, and the prior backups are stored as reverse time > deltas. There could be no connection between a full/incremental and > backups being filled or not. > There is no longer a concept of $Conf{IncrLevels}. It is possible > to have earlier backups filled to reduce the number of merges > required to reconstruct an old backup. Also, there is no need to > store full directory trees on the deltas - only filled backups > store a full tree. Does this mean that the notion of incremental vs. full backup goes away and instead we just talk about filled vs. not-filled backups? This would make sense since already in 3.x when using rsync particularly with checksum caching, the difference between full vs. incremental was getting blurred and was less stark than traditional notions since even in fulls if a file didn't change nothing adds to the pool and nothing is even transferred if the rsync digest is cached. If so, does that mean that there is now a concept of how many backups between fills that replaces the old incremental level concept? And if so, is the choice of how many backups between fills really just boil down to a tradeoff between storage efficiency (due to duplication of directory trees and attrib files) vs. speed of reconstructing intermediate backups? > A backup starts by simply renaming the most recent backup, eg, > $TopDir/pc/HOST/15 to $TopDir/pc/HOST/16, and an initially empty > tree is created below $TopDir/pc/HOST/15. The backup proceeds by > using $TopDir/pc/HOST/16 as the reference (in the case of rsync), > and each time there is a change, $TopDir/pc/HOST/16 is updated, > and the opposite change is made below $TopDir/pc/HOST/15. This > is a big improvement over 3.x since very few disk writes are > needed if the client data hasn't changed very much (currently 3.x > creates a full tree of hardlinks when you do a full backup even > with no changes). This is awesome! Certainly should be a big speedup given that disk IO is often the primary bottleneck. > This approach changes the deletion dependencies. The oldest backup can > be deleted at any time, and more generally the oldest backup of a chain > (ie: if the next older one is filled) can be deleted at any time. Any > other backup can be deleted too, but it requires the deltas to be merged > with the next older backup. A filled backup can be deleted too, and it > will be merged to create a new filled backup with the prior deltas. Am I write in assuming that this means you are exposing perl library routines (and maybe even full scripts) that: 1. Allow one to manually convert a non-filled to a filled backup 2. Delete any backup (whether filled or unfilled) and automatically fill the prior backups so that the integrity of the chain is preserved. 3. Conversely, is there a routine that would "unfill" a filled backup by converting it to a delta relative to the next most recent backup? (this should be possible and could be useful in some cases) > Everything described above is already implemented. > > The one open issue is when and how an intermediate filled backup > is created. One approach is to continue to connect the concept > of a full backup and a filled backup (although the design allows > them to be decoupled). I'm really struggling to understand what benefit there is anymore to the notion of a "full" backup and whether it just adds more confusion to have both a full vs. incremental and a filled vs. unfilled concept. > > The code continues to support the 3.x storage format, so you can > upgrade to 4.x and still access/view/restore the 3.x backups. > However, the first backup after upgrading to 4.x will need to be > a full to establish the first filled reference backup. As per my earlier email, is there code to do a one-time forward conversion of 3.x backups to 4.x backups? The goal of course would be to get rid of the hardlinks while also benefiting from the more robust md5sum checksuming of 4.x? ------------------------------------------------------------------------------ Free Software Download: Index, Search & Analyze Logs and other IT data in Real-Time with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunk-dev2dev _______________________________________________ BackupPC-devel mailing list BackupPC-devel@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-devel Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/