dan wrote at about 10:30:17 -0600 on Sunday, August 23, 2009: > Speed. Backuppc is constrained by I/O performance as a bottleneck on the > system is that the storage volume must be a single filesystem due to > hardlinks. It has been measured a number of times on this mailing list that > I/O is the major bottleneck for backuppc. Getting faster hardware certainly > helps but the reliance on a single filesystem for all data is a bottleneck > for performance as well as an irritation when upgrading storage as you > either need to add additional raid arrays (as expanding a raid is not > generally an option) or just use JBOD with LVM or something. not-ideal. > > My solution is to break the backup scheme into smaller chunks and have a > number of backuppc servers handling a set number of clients. The issues > here are complexity as I need to admin a number of servers and loss of the > file de-duping. In my organization like many others, each client will have > absolutely identical files. 4 backup machines means that a massive amount > of data is duplicated 4 times PLUS whatever redundancy is in the raid. > > A hybrid platform can use the filesystems strengths and a databases > strengths and no have most of the weaknesses. > > > My example was a simplistic one. Sure MD5 can have some collisions so > either MD5+SHA1 or just do SHA2. You would need to store a few more peices > of data but I think it would be hard to argue that mysql is many orders of > magnitude faster at finding data than a filesystem just like it is hard to > argue that a filesystem is many times faster at simply storing files and > even faster at storing large files. > > Other benefits of the hybrid system are that the files can be on a different > volumes than the database. In fact, because you store the files location on > disk in the database, you could store files on many different disks, with to > issues with hardlinks. Because of this, you could put two backuppc machines > together in a cluster and each instance of backuppc would look at the same > database (or replicated data on their own database) and be able to do online > replication of the filestore on other servers. They could automatically > duplicate these files on their own local file store and because there are > not millions of hardlinks to worry about, rsync can actually be useful in > syncing up file stores to other backuppc machines. sure you will still have > a lot of files but you will have a lot less files for rsync to track. rsync > can handle a lot of files. with backuppc rsync actually has to track every > instance of every file from each host and each backup number plus the pool. > without the hardlink pooling rsync would only have to see each file once. >
The hybrid system also has many other advantages including: - Allows Backuppc to work on OS's/FS's that don't support Unix-type hardlinks such as Windoze - Allows for more expandable, robust, and faster storage of metadata. Continuing to expand attrib files to include ACL's and other extended attributes will just make the hack messier and slower. - Allows for more granular security and access controls to backups ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/