Speed. Backuppc is constrained by I/O performance as a bottleneck on the
system is that the storage volume must be a single filesystem due to
hardlinks. It has been measured a number of times on this mailing list that
I/O is the major bottleneck for backuppc. Getting faster hardware certainly
helps but the reliance on a single filesystem for all data is a bottleneck
for performance as well as an irritation when upgrading storage as you
either need to add additional raid arrays (as expanding a raid is not
generally an option) or just use JBOD with LVM or something. not-ideal.
My solution is to break the backup scheme into smaller chunks and have a
number of backuppc servers handling a set number of clients. The issues
here are complexity as I need to admin a number of servers and loss of the
file de-duping. In my organization like many others, each client will have
absolutely identical files. 4 backup machines means that a massive amount
of data is duplicated 4 times PLUS whatever redundancy is in the raid.
A hybrid platform can use the filesystems strengths and a databases
strengths and no have most of the weaknesses.
My example was a simplistic one. Sure MD5 can have some collisions so
either MD5+SHA1 or just do SHA2. You would need to store a few more peices
of data but I think it would be hard to argue that mysql is many orders of
magnitude faster at finding data than a filesystem just like it is hard to
argue that a filesystem is many times faster at simply storing files and
even faster at storing large files.
Other benefits of the hybrid system are that the files can be on a different
volumes than the database. In fact, because you store the files location on
disk in the database, you could store files on many different disks, with to
issues with hardlinks. Because of this, you could put two backuppc machines
together in a cluster and each instance of backuppc would look at the same
database (or replicated data on their own database) and be able to do online
replication of the filestore on other servers. They could automatically
duplicate these files on their own local file store and because there are
not millions of hardlinks to worry about, rsync can actually be useful in
syncing up file stores to other backuppc machines. sure you will still have
a lot of files but you will have a lot less files for rsync to track. rsync
can handle a lot of files. with backuppc rsync actually has to track every
instance of every file from each host and each backup number plus the pool.
without the hardlink pooling rsync would only have to see each file once.
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/