Jeffrey writes:

> Craig Barratt wrote at about 00:31:58 -0800 on Wednesday, March 2, 2011:
>
>  > Jeffrey suggested I outline some of the features in 4.0 to solicit
>  > feedback and discussion.  I apologize for the delay in doing this.
>  > I've been been exceptionally busy during the last few months.
> 
> Thanks Craig -- this is awesome (I assume I am the Jeffrey you
> reference LOL).

Yes, that's you.

> I hope you don't mind my adding my comments and suggestions though
> they will certainly be less well thought through than the months of
> effort you have put into this to date...

No problem - I apprecaite the feedback.  But it will take a while
to get through all the emails.

> Does this mean that the notion of incremental vs. full backup goes
> away and instead we just talk about filled vs. not-filled backups?

Not quite - what I would say is they are decoupled.  In 4.x you
could choose to do only incremental backups with rsync (ie:
perpetual incrementals).  The most recent backup is stored in
full form always.  Prior backups will typically be unfilled.
But if you have a long chain of unfilled backups (ie: N), it
becomes O(N) to reconstruct the Nth oldest backup.  So you
have the option of filling a prior backup, but only as an
optimization to make it faster to reconstruct an old backup.

However, rarely is an old backup viewed or restored, so it
probably doesn't matter a lot.

An open question is what approach should I use for filling a
backup?  Should it be configured (ie: every Nth backup is
filled)?  Or should it be based on whether the backup was a
full or not (making it similar to 3.x)?


> If so, does that mean that there is now a concept of how many backups
> between fills that replaces the old incremental level concept?

Not yet, but it's an open design question.
 
> And if so, is the choice of how many backups between fills really just
> boil down to a tradeoff between storage efficiency (due to duplication
> of directory trees and attrib files) vs. speed of reconstructing
> intermediate backups?

Exactly right!

> This is awesome! Certainly should be a big speedup given that disk IO
> is often the primary bottleneck.

That's right.

>  > This approach changes the deletion dependencies.  The oldest backup can
>  > be deleted at any time, and more generally the oldest backup of a chain
>  > (ie: if the next older one is filled) can be deleted at any time.  Any
>  > other backup can be deleted too, but it requires the deltas to be merged
>  > with the next older backup.  A filled backup can be deleted too, and it
>  > will be merged to create a new filled backup with the prior deltas.
> 
> Am I write in assuming that this means you are exposing perl library
> routines (and maybe even full scripts) that:
> 1. Allow one to manually convert a non-filled to a filled backup

The script I have just duplicates the most recent (filled) backup,
so that's the way a new filled backup is created.  Doing it in the
middle of the chain would be more general, but I haven't done it
that way.

> 2. Delete any backup (whether filled or unfilled) and automatically
>    fill the prior backups so that the integrity of the chain is
>    preserved.

Yes, this is done.  And yes, that's exactly what's required - you
have to merge the changes into the prior backup (if it's not filled)
and keep the reference counts correct.

> 3. Conversely, is there a routine that would "unfill" a filled backup
>    by converting it to a delta relative to the next most recent
>    backup? (this should be possible and could be useful in some cases)

No, this isn't done.  It's an interesting idea.

> I'm really struggling to understand what benefit there is anymore to
> the notion of a "full" backup and whether it just adds more confusion
> to have both a full vs. incremental and a filled vs. unfilled concept.

You're right.  "Full" and "Incremental" are probably the wrong
terms to use.  It really means how "thorough" the backup is.  For
rsync, currently "full" means checksum the blocks on both sides,
and incremental means just check the metadata.  But a reasonable
"full" could be just verify the full-file checksum (and compare
it on the server like any other meta data) using the --checksum
option.  This takes very low server load.  Or you could have a
probability configuration (ie: roll the dice) that determines
which files get the block-checksum compare, and which ones get
just the full-file comparison.

> As per my earlier email, is there code to do a one-time forward
> conversion of 3.x backups to 4.x backups? The goal of course would be
> to get rid of the hardlinks while also benefiting from the more robust
> md5sum checksuming of 4.x?

No, not currently.  See my previous reply.

Craig

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
BackupPC-devel mailing list
BackupPC-devel@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-devel
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to