We have our database at 370GB currently, only 61% full. It has been larger, and more troublesome in the past. We are on a roadmap to get it smaller by continuing to split our servers. I will tell you that it is easier to start a 2nd server before you need it than it will be to move nodes to it later. We have experienced the pains of a too large database. In addition to the two issues that Roger mentioned (db backup and expiration), other symptoms that you may see are longer query times (i.e., when listing files to restore from a large client), longer restore times (again, because the query at the beginning of the restore will take longer with larger databases). We would like to unload/reload our database (it is over 10 years old now, and very fragmented), but the time to do this for such a large database is prohibitive. Another reason you might want to limit your database size.
I will offer a dissenting opinion from Roger's on using striping or RAID for database volumes. I will start by saying that I once shared Roger's view on this. Striping or RAIDing in and of itself does not necessitate multiple spindle access for database reads. If your stripe size is set appropriately, you should only have to access 1 spindle to read a database page. Writing database pages is another story. However, if you have a large write cache in front of your dbvols, then write I/O should not be a problem. TSM, IMHO, does not do a good job of spreading I/O across dbvols. E.g., it does not do round-robin allocation, but rather will allocate new pages all on one volume, then only after that volume fills up will it start allocating on the second volume. By using striping or RAID, you will spread the I/O across spindles more effectively. Even if eventually your database grows and spreads itself across multiple spindles, pages from a particular user are likely to be grouped onto one spindle (with JBOD), causing queries for that user to incur head contention. I would be interested in hearing what others have to say on this matter. When we moved from 10k rpm SSA drives (non-RAID, with cache on the SSA adapter) to 15k rpm FAStT RAID5 arrays, we saw a world of improvement on our server. It made a huge improvement. I attribute this to a combination of faster drives (15k rpm) and spreading the I/O across spindles using RAID5. The ease and simplicity of replacing FAStT drives when the fail is another strong reason to use RAID5 instead of JBOD. The FAStT will automatically resort to using a hot spare drive, and you don't need to do anything other than replace the bad drive with a new one. No fuss, no muss. I will echo Roger's comments on how long it takes to catch up on expiration. It can take weeks or months to catch up if you don't notice how far behind it has gotten. We have since added some metrics to our Servergraph monitoring tool to help us to monitor expiration performance and cache-hit ratios. We also run multiple TSM images per AIX server. You don't need multiple boxes to run multiple servers. I will say that managing one server is conceptually simpler than managing multiple servers, but only until you start running into these problems. IMHO, IBM needs to address this issue either by making a single large server more feasible, or by providing additional tools to manage a collection of servers. Specifically, it would be great if we could migrate a node from one server to another, without having to make a change to the client option file to point it at the new server. ..Paul -- Paul Zarnowski Ph: 607-255-4757 Manager, Storage Systems Fx: 607-255-8521 719 Rhodes Hall, Ithaca, NY 14853-3801 Em: [EMAIL PROTECTED]
