Re: [BackupPC-users] Scaling BackupPC

Timothy J. Massey Thu, 17 Jan 2008 09:42:07 -0800

Ski Kacoroski <[EMAIL PROTECTED]> wrote on 01/17/2008 11:40:15 AM:

 > I have actually done the replication for a smaller site where BackupPC
 > is backing up about 10 machines so it is only running for a few hours a
 > day.  When it is not running I have an rsync to a separate machine
 > run.  The BackupPC instance on that machine can easily be used for
 > restores and, if I turn it on, for backups.  I typically keep BackupPC
 > on the second machine turned off as it is only used in case the first
 > machine fails.  I am not if this will scale because of the hard links,
 > but perhaps if I used DRDB I could get around that problem.


You have described a jury-rigged active/standby cluster, with a manual 
sync between the two.  It certainly works, as far as it goes.  And I am 
implementing something similar with DRDB.  But I am hoping for better.

There's a lot of button-pushing required in your solution:  making sure 
that BackupPC isn't actively doing anything on the active server (or 
actually stoppoing BackupPC), then rsyncing the data to the other box 
(which everyone says is not scalable without *tons* of RAM on both 
boxes), then possibly starting BackupPC again.  And there's no 
coordination between the two instances:   the moment the copy is 
finished, they're two separate pools with separate schedules, etc. 
Which is why, of course, you have to leave BackupPC shut down on the 
other box.  DRDB helps:  it eliminates both shutting down BackupPC and 
rsync limitations for duplicating the pool.  You could also do the same 
thing with breaking a RAID-1 mirror:  Les brings this up every time!  :) 
  But you're still left with two indepenent servers, or at best an 
active/standby cluster.

There are two separate solutions I would love to see.  One is the 
ability to run BackupPC in an active/active cluster:  two (or more) 
BackupPC servers with a common pool between them.  Both servers would be 
able to simultaneously run jobs (but likely not on the same host, of 
course).  This would help scalability:  I have to partition into 
multiple servers because I run out of time before I run out of disk 
space--to create a multi-terabyte array today is trivial.  This would 
also give me N+1 redundancy on my backup servers, which I also very much 
like, all with a single pool, and therefore managed by a single GUI, but 
with mulitple front-ends.

The beauty of this is that conceptually it keeps the typical case (one 
pool, one BackupPC front end) exactly the same:  no extra complexity for 
most people, just the ability to add more front ends as you might want. 
  After all, most people (me included!) *prefer* BackupPC *because* it 
is a very simple system.

The second solution I would love to see is the ability to replicate 
individual backup jobs from one pool to another.  Imagine an archive 
function that instead of storing a backup on a tape, could forward a 
backup job to another BackupPC server.  What I envision is some sort of 
mechanism where one BackupPC server runs an rsyncd backup against 
another server that is providing a rsync service that provides the 
host's data.  In other words, I'm not looking to do it at the pool 
level:  rather, you could even make the receiving BackupPC server 
process the incoming data exactly as if it were receiving it from any 
other host, just without the load on the original host of multiple 
backups.  That's what I want to avoid.  My BackupPC servers have much 
more bandwidth between them than any of them have to their clients...

This would requre the creation of a new Xfer type (like archive), but 
would intentionally not change anything else about the way BackupPC 
works:  it's still receiving files for a particular host that would then 
need to be processed in the normal way BackupPC processes incoming 
files.  If might be great if some optimizations could be taken in the 
data that is sent back and forth (maybe the BackupPC systems could share 
hashes or transfer the files in already-compressed format or similar), 
but only if this does not require changes to the rest of BackupPC.

Alternatively, the ability to intelligently replicate a portion of the 
pool from one BackupPC server/cluster to another without the memory 
limitations that rsync brings would be great, too.  But I'm perfectly 
happy without it:  I don't mind making BackupPC un-pool or re-pool 
whatever data it sends or receives from remote systems (either hosts or 
other BackupPC servers in my second idea above), but hopefully using an 
rsync-style transfer.  If I truly just want a mirror-image second copy 
of the pool, we can already to that (with RAID-1 or DRDB).

Tim Massey

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
BackupPC-users mailing list
[email protected]
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Re: [BackupPC-users] Scaling BackupPC

Reply via email to