Timothy J. Massey wrote: > > > As a start, how about a utility that simply clones one host to another > > > using only the pc/host directory tree, and assumes that none of the > > > source files are in the pool, just like it would during a brand-new > > > rsync backup? > > > > That would be better than nothing, but if you have multiple full runs > > that you want to keep you'll have to transfer a lot of duplicates that > > could probably be avoided. > > Correct. But it's a proof of concept that can be refined. I understand > that some sort of inode or hash caching is required. But the first step > can be done with the parts we've already got.
Agreed, but it's a lot easier to design in the out-of-band info you'll need later than to try to figure out where to put it afterwards. > > But what is the advantage over just letting the remote server make its > > run directly against the same targets? > > I thought a lot of Holger's points were good. But for me, it comes down > to two points: > > Point 1: Distributing Load > =========================== > I have hosts that take, across a LAN, 12 hours to back up. The deltas > are not necessarily very big: there's just *lots* of files. And these > are reasonably fast hosts: >2GHz Xeon processors, 10k and 15k RPM > drives, hardware SCSI (and now SAS) RAID controllers, etc. > > I want to store the data in multiple places, both on the local LAN and > in at least 2 remote locations. That would mean 3 backups. It's > probably not going to take 36 hours to do that, but it's going to take a > *lot* more than 12... > > Other times, it's not the host's fault, but the Internet connection. > Maybe it's a host that's behind a low-end DSL that only offers 768k up > (or worse). It's hard enough to get *one* backup done over that, let > alone two. > > So how can I speed this up? Brute force approach: park a linux box with a big disk on the local LAN side. Do scripted stock rsync backups to this box to make full uncompressed copies with each host in its own directory. It's not as elegant as a local backuppc but you get quick access to a copy locally plus offloading any issues you might have in the remote transfer. I actually use this approach in several remote offices, taking advantage of an existing box that also provides VPN and some file shares. One up side is that you can add the -C option on the ssh command that runs rsync to get compression on the transfer (although starting over, I'd use openvpn as the VPN and add compression there). > And once one remote BackupPC server has the data, the > rest can get it over the very fast Internet connections that they have > between them. So I only have to get the data across that slow link > once, and I can still get it to multiple remote locations. For this case you might also want to do a stock rsync copy of the backups on the remote LAN to an uncompressed copy at the central location, then point 2 or more backuppc instances that have faster connections at that copy. Paradoxically, stock rsync with the -z option can move data more efficiently than just about anything but it requires the raw storage at both ends to be uncompressed. This might be cumbersome if you have a lot of individual hosts to add but it isn't bad if everyone is already saving the files that need backup onto one or a few servers at the remote sites. As I've mentioned before, I raid-mirror to an external drive weekly to get an offsite copy. > On top of this, the BackupPC server has a much easier task to replicate > a pool than the host does in the first place. Pooling has already been > taken care of. We *know* which files are new, and which ones are not. I don't think you can count on any particular relationship between local and remote pools. > There are only two things the replication need worry about: 1) > Transferring the new files and see if they already exist in the new > pool, and 2) integrating these new files into the remote server's own pool. That happens now if you can arrange for the rsync method to see a raw uncompressed copy. I agree that a more elegant method could be written, but so far it hasn't. > Point 2: Long-term Management of Data LVM on top of RAID is probably the right approach for being able to maintain an archive that needs to grow and have failing drives replaced. > However, with the ability to migrate hosts from one host to another, I > can have tiers of BackupPC servers. As hosts are retired, I still need > to keep their data. 7 years was not chosen for the fun of it: SOX > compliance requires it. However, I can migrate it *out* of my > first-line backup servers onto secondary servers. Again there is a brute force fix: keep the old servers with the old data but add new ones at whatever interval is necessary to keep current data. You'll have to rebuild the pool of any still-existing files, but as a tradeoff you get some redundancy. > If my backup load > increases to the point where one server can no longer handle its load, I > can divide its load across multiple servers *without* losing its history. An additional approach here would be to make the web interface aware of multiple servers so you don't have to put everything in one box. > I have a feeling that many people who run BackupPC might be somewhat > caviler with historical data. I *know* I have been: between > permanently archiving content every 3 months, and not keeping more than > a few weeks' worth of data on a server, none of these items were a > hardship. Most of our real data has it's own concept of history built-in. That is, from a current backup of the source code repository you can reconstruct anything that has ever been added to it. The accounting system likewise has its own way to report on any period covered in its database if you have a current copy. There is not much reason to worry about anything but the current versions of things like that. SOX compliance is something else, though. > Trying to keep historical information in place and accessable for > *years* makes that much harder. It is *guaranteed* that over the 7 year > life of that data, it's going to live on at least 2 different servers, > and likely three. The idea of needing to stay locked into the > configuration that was put in place 7 years ago is *not* appealing. > Without being able to move hosts from one server to another--where the > pool is one monolithic block forever--is worrysome. There is always the image-copy/grow filesystem method. > Maybe I'm just abusing BackupPC beyond what it was intended to do. > That's fine, too. But adding the ability to migrate a host from one > pool to another does not have to change a *single* thing about BackupPC > at all. It's kind of like archiving hosts: it's a feature you can use, > or ignore. Don't take anything I've said about the workarounds to imply that I don't agree that there should be a better way to move/replicate hosts and cascade servers. -- Les Mikesell [EMAIL PROTECTED] ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/