-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 David wrote: > On Tue, Aug 18, 2009 at 5:35 PM, Les Mikesell<lesmikes...@gmail.com> wrote: >> Why not just exclude the _TOPDIR_ - or the mount point if this is on its >> own filesystem? > Because most of the interesting files on the backup server (at least > in my case), are the files being backed up. I'm a lot more interested > in being able to quickly find those files, than random stuff under > /etc, /usr, etc.
Yes, and this is something I'd like to have in backuppc (please find a file on any host, in any backup number, with the string abc in it's filename). This isn't possible without using the standard tools like find, and waiting for it to traverse all the directories and backups etc.. (well, you could use grep on the logfiles to find it, which would probably be faster)... >> There's not a good way to figure out which files might be in all of your >> backups and thus not help space-wise when you remove any instance(s) of >> it. But the per-host, per-run stats where you can see the rate of new >> files being picked up and how much they compress is very helpful. > Thanks for this info. At least with per-host stats, it's easier to > narrow down where to run du if I need to, instead of over the entire > backup partition. > > A couple of random questions: > > 1. How well does BackupPC work when you manually make changes to the > pool behind it's back? (like removing a host, or some of the host's > history, via the command line). Can you make it "resync/repair" it's > database? Removing hosts, or individual backups doesn't affect the pool, and in my experience, this works just fine. Although I would advise against doing it, simply because you never know exactly what might get stuffed up.... I've had a remote client rename about 10G of images, so I simply did a cp -al from the previous full backup into the current partial (aborted full) backup, and then continued the full backup. It then noticed all the old filenames were gone, found the new filenames were already downloaded (hardlinked really), and continued on nicely. I've also deleted individual files (vmware disk image files, dvd images, etc) and not had a problem. Of course, if you are going to do things like that, you should try and use the tools that have recently been written to help do this properly. > 2) Is there a recommended approache for "backing up" BackupPC databases? > In case they go corrupt and so on. Or is a simple rsync safe? Stop backuppc, umount the partition, and use dd to copy to another partition, or else use RAID1 with three members, stop backuppc, umount, remove a member, and you have your backup. Rsync *should* work fine for smaller pools/number of files, as long as you have lots of RAM on both ends.... Eventually, you will get a pool size (number of files) where it will stop working... > 3) Is it possible to use BackupPC's logic on the command-line, with a > bunch of command-line arguments, without setting up config files? No, not really. > That would be awesome for scripting and so on, for people who want to > use just parts of it's logic (like the pooled system for instance), > rather than the entire backup system. I tend to prefer that kind of > "unix tool" design. You really sound like a programmer <EG> (yes I have read the rest of your post already)... After configuring backuppc, there are some things you can do to basically cancel out all the automated features of backuppc and just use it's pieces manually. Though I think if you actually used backuppc normally first, you would be unlikely to want to do this. >> Of course, but you do it by starting with a smaller number of runs than >> you expect to be able to hold. Then after you see that the space >> consumed is staying stable you can adjust the amount of history to keep. > > Ah right. I think this is a fundamental difference in approach. With > the backup systems I've used before, space usage is going to keep > growing forever, until you take steps to fix it. Either manually, or > by some kind of scripting, and so far I haven't added scripting, so I > rely on du to know where to manually recover space. > > Basically, I was using rdiff-backup for along time. That tool keeps > all the history, until you run it with a command-line argument to > prune the oldest revisions. You specify in advance how many incremental and full backups you want, what period you want to keep them on, etc. Then backuppc *can* automatically prune the relevant backups to keep what you have asked for. One specific point is that you can keep your daily (incremental) backups for the past month, then every second one for two months, and all fulls (weekly) for the past 6 months, every 4th full for the past two years, etc... > And also, I don't see a great need to pro-actively recover space most > of the time. The large majority of servers/users/etc have a relatively > small amount of change. So it's kind of cool to be able to get *any* > of the earlier daily snapshots, for the last few years. I never recover space on any of my backuppc servers either, but sometimes I increase the number of backups I want to keep :) Yes, some things are cool, but they are rarely useful... However, I have one customer whose backuppc server keeps *every* backup it has ever completed, and that has been running for over 3 years now. > Although ironically, the servers with the largest amount of churn (and > harddrive usage on backup server), are the ones you'd actually want to > keep old versions for (like yearlies, monthlies, etc). But with > rdiff-backup, that isn't really possible without some major repo > surgery :-). You end up throwing away all the oldest versions when > space runs low. Which is the problem with those tools. Sometimes you want to keep the backup from 7 years ago, but you don't really need every daily backup for the past 7 years! This is where backuppc is quite helpful... > Also, I'm influenced by revision control tools, like git/svn/etc. I > don't like to throw away old versions, unless it's really necessary. When it is necessary, do you want to always throw away the oldest version though ? > And, if you have a lot of harddrive space on the backup server, then > may as well actually make use of it, to store as many versions as > possible. And then only remove oldest versions where needed. Again, you might not want to remove the oldest, you might want to remove some of the in between backups... > The above backup philosophy (based partly on rdiff-backup limitations) > has served me well so far, but I guess I need to unlearn some of it, > particularly if I want to use a hardlink-based backup system. Or just get more disk space... > If rsync is used, then what is the difference between an incremental > and a full backup? Basically, the full will read every file on the client and backuppc server, and compare checksums. The incremental will skip this full checksum comparison. > ie, do "full" backups copy all the data over (if using rsync), or > just the changed files? No, both full and incremental will only transfer the modified portions of the modified files (if using rsync). > And, what kind of disadvantage is there if you only do (rsync-based) > incrementals and don't ever make full backups? In the older versions (which my above client started with, and this is the config I started with), an incremental backup would compare the remote client with the last *full* backup, so over time, you needed to transfer more and more data over the network. In current versions, you can backup compared to the last incremental of a lower level (not sure how many levels you can get, but you can do [0,1,0,0,2,1,1,3,2,2,4,3,3,5,4,4,6] etc.. or whatever you like... not sure how many entries can be included there. After working out how this affected backuppc (along with the huge amount of extra work to "fill in" the backups in the web interface), I just configured full backups every 3 days. The only real difference between a full and incremental is the amount of IO load and CPU load on the client (and backuppc server), and hence the time it takes to complete a backup. You really should schedule a regular full backup anyway. Also, another reason for regular full backups is so you don't need to keep every full backup, you can drop every second (or every fourth etc) backup to recover space... > My angle is that Linux sysadmins have certain tools they like to use, > and saying they can't use them effectively due to the backup > architecture is kind of problematic. It isn't that they can't be used... they are just slow, and there are more efficient methods to obtain the same information. I could use find or grep or du on my massive maildir's, but they suck and there are other methods to get some of the answers I need, other times, I have to use du/find/etc... > Probably I need to think more about using a more traditional scheme > (keep a fixed number of backups, X daily, Y weekly, Z monthly, etc), > instead of "keep versions forever, until you need to start recovering > harddrive space". You can still keep versions forever, just set the keepcnt values to very high values... 15 years, or 50 years, etc... The difference is with backuppc you have more flexibility on *which* backups you remove to recover space... Consider the common case of a growing log file, you backup every day, and the file is rotated each month. So, you have 30 versions of the same file, yet you don't really need 29 of them since all the data is included in the last/30th one... etc.. lots of examples I'm sure you can think of :) > But the problem I see is this: > > (From BackupPC docs) > > "Therefore, every file in the pool will have at least 2 hard links > (one for the pool file and one for the backup file below > __TOPDIR__/pc). Identical files from different backups or PCs will all > be linked to the same file. When old backups are deleted, some files > in the pool might only have one link. BackupPC_nightly checks the > entire pool and removes all files that have only a single link, > thereby recovering the storage for that file." > > Therefore, if you want to keep tonnes of history (like, every day for > the past 3 years), for a server with lots of files, then it sounds > like you need to actually have a huge number of filesystem entries. Yes, but is that a problem? With 5 hosts being backed up, I have 401 full backups, and 3303 incremental backups, using 36TB of storage prior to pooling and compression. (ie, if we didn't have hardlinks or compression). We have approx 1.9M unique files in the pool using only 680GB of disk space. I'm not sure how to calculate the actual number of inodes used... (df -i doesn't seem to work as we are using reiserfs, I'm sure you would get major issues doing this on ext2/3 etc..) > I think if I wanted to use BackupPC, and still be able to use du and > friends effectively, I'd need to do some combination of: > > 1) Use incrementals for most of the backups, to limit the number of > hardlinks created, as Les Mikesell described. > > 2) Stop trying to keep history for every single day for years (rather > keep 1 for the last X days, last Y weeks, Z months, etc). or just be more patient with how long those tools take to run, and realise that they might stop working one day if your pool/etc gets too big... > This would also mean having to spend less time managing space. > Although at the moment it only comes up every few weeks/months, and > had been pretty fast with du & xdiskusage, at least until I switched > over from rdiff-backup to a "make a hardlink snapshot every day" > process :-(. or just get more disk space :) > And furthermore, hardlink-based storage does cause ambiguous du > output, even if the time it took to run wasn't an issue. Which is > another thing about hardlink-based backups which annoys me (compared > to when I was using rdiff-backup), and one of the reasons why I'm > currently running my own very hackish "de-duping" script on our backup > server. Or is it that you don't know the right tool for this job which annoys you (a little sarcasm :)... > Nice that BackupPC maintains these stats separately. Although kind of > annoying (imo), that you have to go through it's frontend to see this > info, rather than being able to tell from standard linux commands (for > scripting purposes and so on). As far as I know, the format of the files this information is stored in is well documented, and as such you could write scripts to your hearts content to read/parse this simple text files, and get any information you desire... > And also it bothers me that those kind of stats can potentially go out > of synch with the harddrive (maybe you delete part of the pool by > mistake). Ummm, don't make mistakes :) or if you do, fix the stats... > Is there a way to make BackupPC "repair" it's database, by re-scanning > it's pool? Or some kind of recommended procedure for fixing problems > like this? I am pretty sure there is no such tools... you either live with it until the relevant backups are purged, or you manually stuff around, potentially making the problem even worse (ie, messing it up in a way that you don't know you have messed it up, as opposed to knowing it is wrong). >> As a side note are you letting available space dictate you retention >> policy? It sounds like you don't want to fund the retention policiy >> you've specified otherwise you wouldn't be out of disk space. Buy >> more disk or reduce your retention numbers for backups. > And since we have a fairly large backup server (compared to the > servers being backed up), I let the older backups build up for a while > to take advantage of the space, and then free a chunk of space > manually when the scripts email me about space issues. > > But now I can't "free a chunk of space manually" that easily any more, > since "du" doesn't work :-(. rm -rf TopDir/pc/host/nnn where nnn is a random incr backup number or a full backup which no remaining incr relies on it seems to work pretty well. Though I'd advise adjusting the values in the config file and letting backuppc purge the backups itself. > Well, the good news is that nobody here seems to care about the > backups much, until the moment they're needed. The fact we have them > at all is kind of a bonus D:. At least I'm starting to get the boss > (we're a pretty small company) on my side. Just that nobody besides > myself has time to work on things like this. Once you lose all the data, everybody will have plenty of time :) You can't afford not to have good backups! (But hey, *we* all know that....) One other thing that should be considered, the point of using backuppc is that lots of other people use it, and have checked that there is no bugs etc in it. As such, we are somewhat certain that we will get back the correct data as long as we treat it correctly (don't fiddle with it's storage behind it's back)... Home grown scripts/programs can be hugely rewarding/etc, but you will never get the same reliability/certainty about the software. Of course, you also have to write all the improvements yourself, instead of just downloading the new version that someone else was nice enough to write for you :) > PS: Random question: Does backuppc have tools for making offsite, > offline backups? Like copying a subset of the recent BackupPC backups > over to a set of external drives (in encrypted format) and then taking > the drives home or something like that. Yes, you can archive backups... One of my customers plugs in a esata drive, crontab runs a script to mount the drive, create the tar files of the most recent backups onto a staging (internal raid array) area, delete the files from the external disk, and then copy the new tar files onto the esata, and finally delete the files from the staging area... Lots of checks/etc to make sure we are doing the correct things, and alerts (or OK's) are reported back to the monitoring system as needed. > Or alternately, are there recommended tools for this? I made a script > for this, but want to see how people here usually handle this. This is where custom scripts/plugins are best utilised. A single program can't determine the possible needs of every user.... :) I hope the above information is useful to you, please note it is just my wordy opinion, and probably hardly worth the electrons used to display it. Please recycle them thoughtfully... Regards, Adam -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkqL6NAACgkQGyoxogrTyiUAfwCfbrQU8HrY4NgcYzihRuv1kMLs HOsAnjFVA/ALzyrQtJZKwaLTnSREvDmu =ANHr -----END PGP SIGNATURE----- ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/