Hi all,

Carl Wilhelm Soderstrom wrote on 21.08.2007 at 09:04:40 [[BackupPC-users] 
wishlist: full backups whenever incrementals get too large]:
> [...]
> Would it be reasonable to have backuppc check the time used by the last
> incremental against the time used by the last full, and if it's taken longer
> to do the incremental, then automatically do a full backup next time? (Of
> course, make a note in the logs as to why this was done).

no, not unconditionally.
1.) Bandwidth or backup duration may not be the primary concern. Maybe an
    individual setup can tolerate longer backups better than more server or
    client load.
2.) Duration of a backup is no accurate measure for the amount of data
    transferred. Maybe there is a completely different reason why the
    incremental backup takes significantly longer (like server/client usage
    or even a network problem limiting bandwidth to a fraction of the normal
    value). The point is: you can't tell if a full backup would have been
    faster under the exact circumstances of the incremental.
3.) Running a full backup in the middle of the week (or at any time it's not
    supposed to be run) may be problematic for some setups (eg. you've tuned
    your BackupPC server to run full backups for different machines on
    different weekdays).

Jacob wrote on 21.08.2007 at 11:59:59 [Re: [BackupPC-users] wishlist: full 
backups whenever incrementals get too large]:
> [...]
> If this is really the way backuppc does incremental backups, I think
> backuppc should be a bit more incremental with its incremental backups.

I don't, but you surely remember Craig writing on 30.04.2007 at 22:16:31 -0700
[Re: [BackupPC-users] Incremental transferring same data over and over] with
message-id [EMAIL PROTECTED] :

> Unless I'm forgetting a good reason why I did it that way, for
> rsync I should make the reference backup the most recent backup
> in all cases - both full and incremental.  It's a pretty simple
> change - I'll add it to the todo list for 3.1.0.  The logic is
> correct as is for smb and tar: the reference backup for an
> incremental always is the backup of the next lower level.

The only thing I can think of is a variant of the normal rsync incremental
"problem". The only potential misses are changed files with unchanged
attributes. You can construct cases where files would be picked up relative
to the last backup of the next lower level but not relative to the most
recent backup (because the attributes changed before the most recent backup
but not since then). I realise this may be stretching an already unlikely case
a bit far, but the point is: on full backups the optimization of using the
most recent backup as reference (for rsync!) is safe, because file contents
are rechecked by the block checksums algorithm, so you are guaranteed to get
an exact copy of the source tree in any case. For incrementals, you can't say
"compare block checksums only for some files we mistrust a bit more than
others, but not for the rest of them", so you would actually be doing a higher
level incremental backup than strictly requested. Thinking about that, would
not using the most recent backup (of higher level) as reference yield an
incorrect (unfilled) backup tree? Wouldn't constructing the view of the new
backup necessarily require the reference backup? You can probably get around
these problems, but it doesn't exactly sound "simple" to me :-).

Jacob:
> Instead of comparing against the last full, it should compare against the
> last full and incremental backups. This would solve this problem and make
> backuppc more efficient anyway, AFAIK.

There are still those of us who consider accuracy rather than efficiency the
foremost goal of backup software. And there are those that define
"efficiency" as "solving the problem in a faster way". If it's faster but
doesn't solve the problem, it's not more efficient.

Let me remind you of a few facts in case it was not clear above:

- Full backups are supposed to get an *exact* copy of everything under all
  circumstances. There are no compromises for the sake of speeding things
  up.

- Incremental backups are an optimization *that comes at a cost* of reduced
  certainty that all contents are correctly backed up. If this were not so,
  you wouldn't need full backups. The first incremental would simply be
  relative to <nothing>, therefore transferring everything. Case finished.

  With conventional tape strategies, it is simply not feasible to take a
  complete snapshot every time. It is also simply not feasible to read all
  files from tape from the previous backup(s) and compare the contents byte
  by byte. Therefore the incrementals, with the exact same risk of missing
  changes, as you only have timestamps to go by. In fact, with BackupPC
  rsync incrementals, you have *less risk* than with conventional tape
  strategies, because the decision is more elaborate than
  "mtime > last_backup_time".

- The higher the level of the incremental backup, the greater the speedup,
  but the less certain you are of not having missed some changes.

This is why you have all the options for defining your backup strategy. It's
*your* decision as backup operator, what chances you are willing to take in
order to speed things up.

A further point: it's not "BackupPC should compare against whatever", it's
the underlying transport method that has to do it. You're clearly talking
about rsync(d), but tar/smb handle only timestamps for incrementals. Perhaps
tar/smb make the problem more clear: if you change the reference backup
(i.e. the timestamp), you alter the information you obtain. For example, you
miss files created after the full and deleted again after the most recent
incremental. Your backup will contain those files although it shouldn't. Your
backup won't be a level 1 incremental (in the simple case), it will in fact
be a level 2 incremental, possibly stored within BackupPC in a form that it
can be accessed as though it were a level 1.
The same applies for files created after the full and then moved into
another directory, or created and dated back to a time before the
incremental (except that those files will be missing in your backup although
they should be present).

Although rsync misses less changes, it *can* miss some, and the same thought
applies, although it is more difficult to describe. You simply don't get the
benefits of a level 1 incremental over a level 2 incremental, if you *do* a
level 2 incremental. If you don't need those benefits, then set up a level 2
incremental in the first place!

Jacob wrote on 24.08.2007 at 08:50:20 [Re: [BackupPC-users] wishlist: full 
backups whenever incrementals get too large]:
> [...]
> Maybe it's time for new principles? ;)

Sure, when strategies are well understood and proven by long time experience,
it's time to mandatorily switch everyone from one strategy to another because
the second one is faster :-).

> With large files, though, it is an absolute time- and space-waster.

Go and re-read the BackupPC documentation and then tell me where *space* is
wasted. Unless you are confusing "space" with "space divided by time"
(bandwidth).

Though I'm repeating myself: concerning "waster", it's only a waste if you
doubtlessly know it's unnecessary, which, *in the general case*, you don't.
As human, you might know that you copied foo.iso to bar.img, so it's a waste
to transfer bar.img when you know you already have the contents on the other
side. Backup software can't know that, so it's only doing its job in the way
it's designed to.

> I could easily see myself wanting to backup a 2GB .ISO, but wouldn't want
> it to take 4x the actual size of the ISO just because of the way it's backed
> up. :s

You've got something wrong there.

Regards,
Holger

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Reply via email to