On Thu, 11 Sep 2008, Holger Parplies wrote: > Hi, > > Ludovic Drolez wrote on 2008-09-11 16:55:18 +0200 [Re: [BackupPC-users] > Bug#497888: backuppc: please make use of the rsync algorithm, particularly in > resuming interrupted backups]: > > On Fri, Sep 05, 2008 at 05:09:19PM +1000, Tim Connors wrote: > > > It would be very nice if the backuppc client communicated as a regular > > > rsync client back to the rsync server, and didn't wipe the tree that > > > had already been partially trasferred. So when 400MB of a 600MB file > > > has been successfully transferred, only the delta gets transmitted the > > > next time. Yes, of course, this behaviour should be configurable so
By the way, for the backuppc-users people, the rest of this bug report was filed as a debian bug and is viewable here in complete form: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=497888.html > > Yes backuppc should work this way. You may have a configuration > > problem somewhere... > > actually, I believe BackupPC *does* remove the in-progress file at the time > the backup failed. This is logical in that you don't have a [partial] backup > that reflects an incorrect state of a file (i.e. if it's there it's correct) - > a random file from the user's point of view, as I think should be pointed out. However, previous backups are in the pool as /var/lib/backuppc/pc/<hostname>/<n> (and hardlinked into the common pool /var/lib/backuppc/cpool/) whereas the incompeltely transferred filesystem is in /var/lib/backuppc/pc/<hostname>/new Of course you wouldn't serve up the incompletely transferred filesystem as a valid backup. Of course you would complete that transfer before renaming it and linking it to the preexisting pool. Subsequent resumes of such a backup would naturally be done with the --inplace --delete flag, so that the remote rsync could tell the backuppc process which files had been deleted in the meantime, and --inplace would take care of these large but incompletely transferred files. Or instead of --inplace, since backuppc is acting as the rsync client, at least record what temporary file you used so that you can just get the delta. > There is no easy way to "see" with which file in-progress the backup failed > when you're navigating the file tree, so you can't easily account for > incorrect file contents. But why would you serve the incompletely transferred /var/lib/backuppc/pc/<hostname>/new pool out? You don't currently, do you? You just say the backup is in progress, here's the partial logfile. > As for used disk space on the BackupPC server, the > partial file would be unlikely to match a pool file, but it would be removed > again once a larger partial completes, so keeping it would not overly > "pollute" the pool. No, but the partial file would still exist in, in my case, -rw-r----- 1 backuppc backuppc 275990254 Sep 5 14:45 /pcbackups/pc/denman/new/f%2fwindows/fDocuments and Settings/fAdministrator/fApplication Data/fThunderbird/fProfiles/f5a5c3e4f.default/fMail/fLocal Folders/fSent if you didn't then go and explicitly delete it when cleaning up after the link failure. At this stage of the backup - an abort due to the rsync link failure, you can see that the file is not linked in yet to the pool. > The other thing to note is that partial backups are only saved for full > backups. So if I do a full backup of all of the filesystem on mum's computer, it will get to the 4th filesystem with this problematic mbox file, fall over at that point when she turns the computer off in 12 hours, and then the next full backup would start from this file, and not have to do the rest of the backup of the other 3 filesystems first (only to delete them all later when it realises it can't complete this backup)? Most likely, it would still have to start from the start of this file though, because the first time around, it's still got to get all of those 650MB across the net as it doesn't yet have an earlier version to work from. Which means it's still going to fail some 400MB or so into the transfer, and I'll incur another 400MB cost to mum's quota. > When an incremental backup is aborted, no partial is saved. But you're not serving these incompletely transferred full backups out as valid backups (complete with partially transferred files -- the jist of my bug report) are though, are you? So what am I missing? How are you retaining the incomplete transferred full backup without serving it out? Why is full different to incrementals? > This > makes sense, because an incremental backup is based on exactly one reference > backup - the full backup in the simple and usual case. Technically, you could > use a merged view of the reference backup and the partial backup as a starting > point, but this would - strictly speaking - make it a level N+1 incremental > instead of the requested level N. Well, because you don't freeze the filesystem during a single backup, it's just the same as a potentially very long single backup :) Respawn the rsync process a second time, and communicate it with such a way that fools it into thinking it's starting transferring from the point at which it left off last (or better yet, your merged view explanation). > This may be splitting hairs for rsync, and > there may be far better reasons for not using partials on incrementals that I > am missing. For rsync, such a view would seem eminently sensible to me. For tar etc, you're still not transferring the whole filesystem in an incremental run - you're just asking it to transfer new files, aren't you? So you can still compare them against the merged view when working out what needs to be transferred (not that you'd use tar across the internet). > Tim Connors wrote in 497888 on bugs.debian.org: > > Furthermore, if it's not running the delta algorithm over an > > interrupted backup, I'm guessing that a regular incremental backup run > > involves the backuppc server transferring whole files corresponding to > > those files that had changed compared to a previous version, and then > > running the comparison against the pool to work out which files can be > > linked. > > That is not the case. Though you are misinterpreting '--inplace' (which only > means command line rsync does not make in intermediate copy of the file(s)), No, what I was trying to imply was just that you should have a reproducable filename for the incompletely transferred file. If the backuppc "rsync client" just acts in the default rsync way of creating a temporary file, then you can't find that temporary file next time you want to resume the backup, so you need to use something like --inplace to get it to write to a predictable location. Of course, acting as your own client that just happens to talk the rsync protcol allows you to do whatever you like -- I'm just advocating the best course of action is to not delete the incompletely transferred file. > BackupPC does do exactly what you want: determine a sensible reference file > and transfer deltas on both full and incremental runs. Note that the > reference for a (level 1) incremental is the preceeding full backup, so you > may be re-transferring the same delta multiple times, but for your case of a > slowly growing mbox file, this is only a problem if you have a *long* time > between full backups. Bandwidth-wise full backups may be cheaper than > incremental backups, as has been explained on the mailing list many times. I'll have to go back to that. I'd be wondering what the point of an incremental is over a full backup in the rsync case, if bandwidth is maximised and you still get the benefit of pooling files with full backups. Anyway, as long as I make sure the automated dailys from this particular remote host are full backups, then it's probably going to do what I want. But until I can get the first transfer done after a major filesystem reorganisation on my mum's computer, I won't know! Hence the main thrust of this bug report. > In particular, if you currently have only one very old full backup with your > mbox file only a few KB in size, a level 1 incremental will transfer almost > all of the file, while a full would transfer only the delta since the last > incremental (!). > > Regards, > Holger Ciao. -- TimC Ken Thompson claims that he started developing Unix so he could play Space War, but the end product shows he was really much more interested in cheating at Scrabble. --Steve VanDevender -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

