Jon LaBadie wrote: > On Tue, May 08, 2007 at 10:10:15AM -0400, Chris Hoogendyk wrote: > >> Jon LaBadie wrote: >> >>> The second part can only be done by actually doing restores. >>> Perhaps you could schedule periodic recoveries of files >>> or directory trees. Do some sort of varying selection of >>> clients, tapes, and data to recover. Maybe even a regular >>> "the chips are down" disaster exercise. >>> >> I would absolutely agree with Jon. >> >> You simply cannot be "sure" or "guarantee," but you can attain a level >> of confidence -- statistical sampling and testing if you want to get >> formal about it. After installing a new backup system, the first thing >> after backup should be to test recovery. Then, periodically pull a tape >> at random and test recovery. Experience and confidence are common terms, >> but you can also estimate probabilities of future success or failure >> based on the data if you really want to dig into it. >> >> > > One thing I dislike about "random" sampling is the possibility of never > testing certain combinations. I think the statistical approach would > give me more confidence, particularly if all combinations were regularly > tested in a reasonable time frame. Of course your reasonable time frame > might seem excessive to me ;) >
Statistical sampling can be done in many ways, designed to cover different situations and starting with the definition of the population to be sampled (including sub populations). Simple Random Sampling (srs) is what the lay person thinks of as "random" sampling. However, if you crack a book on Statistical Sampling, you will find many many chapters on different models and approaches to sampling. That's why I suggested he visit the stat lab. Used to be that students, faculty and staff could walk in there and get help. I haven't been there since 1981, so it may have changed. Sampling Design works backwards from either partial data or assumptions about the population and allows you to determine what sample size or frequency you need to attain a certain level of confidence or precision of estimation. What you may end up doing is simply more precisely estimating your failure rate. But, then you could use that information to augment your backup procedures, if you thought your failure rate was higher than you were willing to accept. If you take this idea and put it on a time sequence going forward, then it becomes a sort of early warning system. When the current estimate of failure rate reaches some critical threshold, it's time to ... replace tapes, replace some hardware, figure out what the problem is, ... or whatever. I suppose anyone who had a large enough installation and felt the need to take it to that much depth could also afford to hire a statistical consultant. Others of us have to fly by intuition. >> The other side of this is your own personal experience and confidence. >> When "the chips are down", you can say, "Ah, I've done that a bunch of >> times. I'm confident I can do it now." >> >> You need both of those in the common sense. >> > > The confidence and experience aspect is a great point. And if your > backup system is worth 12K as was stated, then there are probably > multiple people who need to gain that experience and confidence. > Not just the one with primary backup responsibility who invariably > happens to be on vacation just when "the chips are down". > Hey, if he doesn't want to share the joy, he can always carry a beeper on vacation. ;-) --------------- Chris Hoogendyk - O__ ---- Systems Administrator c/ /'_ --- Biology & Geology Departments (*) \(*) -- 140 Morrill Science Center ~~~~~~~~~~ - University of Massachusetts, Amherst <[EMAIL PROTECTED]> --------------- Erdös 4
