In the message dated: Wed, 09 Aug 2006 11:21:44 +0200,
The pithy ruminations from Casper Thomsen on 
<Re: [BackupPC-users] Keep the last n revisions of files> were:
=> On Wed, 9 Aug 2006, Ralf Gross wrote:
=> (...)
=> >> What would be really great to have is the possibility to ensure that I
=> >> have the last n revisions of files; no matter how many fulls or
=> >> incrementals.

Interesting idea, probably best suited to a revision control system.

=> (...)
=> >
=> > I also think this would be the job of a revision control system.
=> 
=> Or the job of a really smart backup system ;-).
=> 
=> (...)
=> >> Any pointers, good ideas, work-arounds or whatever is of course
=> >> appreciated. Thanks in advance!
=> >
=> > How will you ensure that a file has not been changed several times since 
=> > the last backup? In what cycle would you start your backup to get every 
=> > existing version of that file?
=> 
=> I will not ensure that the file has not changed several times between 
=> backups. My formulation was inexact, sorry! What I want is the last n 
=> revisions of files when they are checked for changes (once a day, week or 
=> whatever). I only want the last n revisions of backups, not the "real" 
=> file revision.

OK. Do you plan to list specific files for which revisions should be retained 
(in which case there's much more overhead, a more complicated config, but the 
storage requirement would be lower) or apply the revision settings to every 
file?

If the former (a list of files), then it sounds like something that's best 
handled via a revision control system. In the simplest sense, you could 
check-in files manually. This could also be automated, so that when BackupPC 
connects to the client to initiate a backup, it runs a script first. The script 
would traverse the filesystem (ie, using "find -newer") or check specified 
files, and automatically check them in to a revision control system running on 
the client prior to the backup.

If you're thinking about applying the concept of saving revisions to the whole 
system, then it sounds more like you want a filesystem "snapshot", rather than 
a file-by-file revision history. This is much more common in backup systems, 
and would give you the ability to restore the entire system (or individual 
files) to a specified point in time.

Be aware that there are many files that change often where you probably don't
need or want to keep successive revisions (caches, mail spool files, mailboxes,
config files that maintain a list of "last used files", etc.).

=> 
=> Just to make it totally clear (I hope): If a file has changed when it is 
=> being backed up, if there is less that n revisions of the file backed up, 
=> do nothing (just back it up as usual), otherwise (back it up and) delete 
=> the oldest revision of the file.

That sounds computationaly expensive, and would signifcantly increase both 
storage and IO. Without a database backend to track file versions (the number 
of revisions, and the backup number), it would be extremely impractical.

You're describing a much more traditional backup system, where each backup is 
stored onto separate volumes (CDs, disk-based files, tapes, etc.), and each 
backup has it's own file list and expiration period. This has some advantages 
(and disadvantages) over BackupPC, and would be much better suited to your 
revision scheme.

=> 
=> Actually there would be other possiblities: you could also (1) set a bit 
=> that indicates that the file can be deleted, you could (2) delete all the 
=> oldest revisions such that there is exactly n left, and maybe some other 
=> strategy I haven't thought of yet.

Again, that sounds like the act of expiring all backups older than a given 
date.

Consider looking at a "traditional" backup system (ie., not using BackupPC's 
concept of pooling and the use of a single storage "volume"), such as amanda or 
bacula.

=> 
=> The different approaches imply new decisions.
=> 
=> Ad (1). When should it be deleted? Should it be decompressed and deleted 
=> imediately, once a day, while _nightly runs, or only when all (or x 
=> percent) of the files in the compressed file is set for removal?

That would add huge overhead. It's much more efficient to deal with an entire 
backup on a given day as a single "revision".

=> 
=> Ad (2). Maybe it would be possible to have a flag or even the possibility 
=> of specifying an "algorithm" to decide how many revisions to be deleted 
=> (dependant on how often the file changes, how many revisions there are, 
=> how recent the revisions changes (in about uniformly or not) etc.). This 
=> seems, admittedly, quite strange.

Interesting. I like the idea of the dynamic algorithm. This is similar to 
amanda, in that it dynamically chooses which filesets to backup, based on the 
queue and backup frequency. However, I see this as having limited application. 
The idea of keeping successive revisions, in addition to basic backups, seems 
to be at odds with the idea that older revisions would be deleted dynamically.

Mark

=> 
=> > Ralf
=> 
=> 
=> -- 
=> Casper Thomsen
=> 

-----
Mark Bergman    Biker, Rock Climber, Unix mechanic, IATSE #1 Stagehand

http://wwwkeys.pgp.net:11371/pks/lookup?op=get&search=bergman%40merctech.com

I want a newsgroup with a infinite S/N ratio! Now taking CFV on:
rec.motorcycles.stagehands.pet-bird-owners.pinballers.unix-supporters
15+ So Far--Want to join? Check out: http://www.panix.com/~bergman 



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Reply via email to