At $day_job, we just planned a new backup system for about a petabyte of scientific data--about an order of magnitude larger than you're talking about, but we had to deal with the same problems. A few comments inline below.
On Fri, Oct 19, 2012 at 12:15 AM, Ski Kacoroski <[email protected]> wrote: > Hi, > > I could use some advice on backup options. I have a 4yr old Data Domain that > has worked perfectly, but it is totally filled (actually overfilled) and > pricey to maintain. It is located at the remote site connected to my primary > site by fiber and I just NFS mount it to my backup server. A full backup is > around 23TB and my backup set of fulls and incrementals around 90TB. My data > growth has been around 20% a year, but if the school district decides to move > to student portfolios, it will easily double and maybe triple in a few years. > I am not a 7x24 shop so for all my applications and databases, I just dump > the files at night and back them up. I generate about 600GB of long term > archive data a year that goes to LTO3 tape. The primary purpose is for > disaster recovery although we do about 1 - 2 file restores a month. 90% of > the data in on an EMC VNX that I backup via NDMP. So far I am safely within > my backup window, but that may change if I double or triple the data. Op ti > ons I am looking at are: Plan for the ability to grow and scale. You don't *have* to actually do so, if the data rates stay the same or grow slowly, but you've already identified a case where growth may be an issue. Make sure you can cope for the next few years. > 1. Plain disk with an nfs server on it, no dedup. This is definitely the > least expensive option and can grow cheaply to handle my worst case data > growth Just make sure that you have good server hardware, good disks, and appropriate RAID levels. (RAID6 or 10, or ZFS's RAIDZ3). > 2. Data Domain - very pricey as it is about 5x cost of option #1 for about > the same logical capacity. At worst case data growth I will need another one > or another forklift upgrade. We have two of these, and are replacing them with two larger ones. They work well, but yeah, they are pricey. NetApp also does dedup, so perhaps that's an option? > 3. Data Domain used - does not come with software support, and about 1.5x > cost of #1. At worst case data growth I will need another one or another > forklift upgrade. I am concerned about lack of software support. For hardware, there are some companies that do "3rd party support". That may be another option as well. But yes, the lack of software support may be an issue. We haven't had any (serious?) software issues with our DD's that I can recall in the past several years. Maybe run at risk, and if you absolutely need software help, whip out the credit card? :-/ > 4. A ZFS system with dedup. About 2x the cost of #1, and from what I hear > the dedup is not good for this application (e.g. backup software kind of > breaks dedup on ZFS) so I am assuming minimal dedup savings. This can grow > to handle worst case data growth. Echoing what others have said, I've not seen much good about ZFS dedup; it's expensive (RAM and CPU), and not very efficient. > 5. 4 Drive, 48 slot LTO5 library. Same cost as #1 and by swapping tapes once > a week or every other week I can handle worst case data growth, Yes. Do this, and consider using Amanda (or Zmanda) or one of the other open source backup programs. Tape seems to be pretty maligned these days, but IMO, that's a mistake. The capacity of your backup pool is basically infinite (buy more tapes and drives), the incremental cost of scaling the system is low (individual tapes and drives are cheap, relative to the whole system), and the shelf life of a tape is measured in years when stored properly. You can't really say the same for disk-based systems in most cases. Tape isn't perfect: you have to keep working drives for old media, you have to deal with tape rotation, etc. Basically, how much is the data worth? If it's more than the cost of your backup system, you need a good one. How *long* are the backups "valuable?" That determines your retention period. > 6. Exagrid - I suspect this will be the same cost as the Data domain Can't comment on them. > Any other options I should be looking at? What would you do in my case? We've had decent luck with using Coraid hardware (www.coraid.com) for cheap bulk storage. They changed their pricing models a few years back, and it didn't make sense from a cost perspective, but that is another option you could consider. You could buy their shelves (and head nodes that run opensolaris/zfs), or build your own shelves (commodity servers with lots of disks), build your own heads (Linux, probably). I'm happy to elaborate on this if there's any interest. > I appreciate and look forward to your responses. Good luck! -- Jesse Becker _______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
