Hi, Les Mikesell wrote on 2014-03-19 11:25:38 -0500 [Re: [BackupPC-users] Centralized storage with multiple hard drives]: > On Wed, Mar 19, 2014 at 5:53 AM, thorvald > [...] > With rsync/rsyncd xfers, you would at least get hardlinks to identical > files in different runs on the same target.
make that "same files" in the sense of the XferLOG message - files that are unchanged in comparison to the reference backup won't need extra space. If you have identical content within one backup (say 100 copies of CVS/Root), you'll still have individual storage for each instance of the content. You get exactly that from 'rsync --link-dest' by the way. Les Mikesell wrote on 2014-03-19 14:35:17 -0500 [Re: [BackupPC-users] Centralized storage with multiple hard drives]: > On Wed, Mar 19, 2014 at 1:48 PM, Timothy J Massey <tmas...@obscorp.com> wrote: > > > > > Let's say that the storage is not a problem for me and I can have as > > > many TB or PT as I need. However the main assumption is that every > > > box has got a separate "disk" to be backed up to. So now I faced the > > > problem with BackupPC which does use pool or cpool to store files > > > within :/. I don't need any compression or deduplication. Is there > > > any way to backup files directly to pc/HOST/ instead ? > > > > > > I am going to give a flat "no" to this. You may be able to break things > > within BackupPC to accomplish this (never run the link, for example), but > > you are *breaking* things. Don't do that if you expect *anyone* to be > > able to help you. > > I don't know about that Well, I do. You'd be running modified code that behaves contrary to expectations on this list and that isn't what you expect it to be: tested and proven. You'd get misleading help because we don't know what you modified and how. You'd waste your and our time sorting this out. The point here is that BackupPC may just not be the right tool. You can probably modify apache to serve TFTP or DNS requests, but why on earth would you do that if those are the only things you're going to use it for? There are other tools that already do those jobs without modification. BackupPC is about deduplication. It puts a lot of effort into this (in terms of code path, CPU and disk utilisation). If all you really need is a smart rsync invocation and an expiration logic, then why incur the unneeded overhead literally hundreds of times? > - people on the list report fairly often that they are using too much disk > space and it turns out that links have been failing for one reason or > another - but they still have working backups. Right. And the first thing we tell them to do is: fix linking. That's not interesting. The question is, what do we tell them when they *don't* have working backups? "Find out why linking isn't working. Your problem is probably related to the cause of that. If not, start fresh with working linking. If your problem persists, then come back." > > > I'm not going to backup couple of hundreds servers using one > > > BackupPC instance of course but I want to back up at least 100 > > > servers per BackupPC instance. > > > > > > Is there something you could advise me ? > > > > Sure: use virtualization. Create your huge datastore (or multiple > > datastores) and create a VM for each unit that needs its own pool. That doesn't seem to fit "at least 100 servers per BackupPC instance" (and "separate disk for each host"). > Interesting concept, but it seems like it would add a horrible amount > of overhead in terms of setup and maintenance - even just tracking > which VM does which backup. Although - maybe it would mesh with > whatever is driving the idea of keeping the backups separate. Maybe. We're just guessing. > You'd have scheduling issues that a single server would sort out, though. Right, and you probably need synchronisation of backups for your network bandwidth, even if not for the disks. > > There are other things that you'll have to worry about, no matter whether > > it's a single instance or multiple VM's. The screamingly obvious one is > > disk performance. [...] > > Throwing RAM at a disk performance problem usually helps. You've used BackupPC before, Les, right? ;-) BackupPC prefers pool reads over writes when possible, and it typically accesses large amounts of data almost randomly. Caching metadata will help, caching data likely won't. The benefit from cached metadata is mainly in the {c,}pool structure, I would expect, which would be removed - along with a lot of the disk reads - if you have no (working) pool. The part of the problem you are trying to fix most probably vanishes. On the other hand, eliminating pooling means that each file in your backup set will be stored independently and accessed in roughly the same order on each backup. Unless you can cache the complete data from a backup (and keep it until the next backup runs), you gain nothing from caching any data. > As does not using raid5. One disk per client host sort of precludes raid5 ;-). Regards, Holger ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/