-----Original Message----- From: IBM Mainframe Discussion List <[email protected]> On Behalf Of John McKown Sent: Tuesday, July 7, 2020 8:58 AM To: [email protected] Subject: [External] Re: Storage & tape question
On Tue, Jul 7, 2020 at 8:19 AM Jackson, Rob <[email protected]> wrote: > Fun little note on RAID: it is fallible. The last Sunday of October > 2016 I got a call bright and early because our VTS (TS7740) had shut down. > Turns out we had a "cache" HDD failure at around 4 AM, and then a > second one failed at around 7 AM, before the first one had been > rebuilt on a spare. RAID-5 could not accommodate it. Because of IBM > politics, we had no tape until Monday at 16:00. I am ashamed to say > that I sort of took tape for granted. It was astonishing how much of > our processing depended on it. > We had a similar problem occurs, long ago, with an actual SAN dasd array (for Windows, not MVS). Weekend backup to physical tape aborted on a Sunday. The Windows admin said "No problem, it's a RAID-5 array, I can fix it Monday morning." A few hours later, a disk in the array failed. No problem, right? Unfortunately, while the CE was on his way in to replace it, a second disk failed. The array was destroyed. Management said to repair it and reload from the Sunday backup and we'd be good. When the admin admitted that the backup failed and he didn't go in, he was immediately terminated. Now, what are the chances that 2 drives in an array will fail within hours? I don't know, but one thing many don't think about with a "new array" is that all the drives are likely the same age and will start to fail (if they are) about the same time. IMO, given my paranoia, I firmly believe that the disks in an array should be replaced on a scheduled basis. I also believe in dual tape copies of important tapes. And also, that tapes in "long term" retention (we have tapes which have been at Iron Mountain for over 10 years!) should be brought in and the data copied to a new (not reused) tape annually. Of course, the bean counters will have an apoplectic fit and scream about how much it costs to do this. They only understand cost, not value. I consider them the bane of existence. Likely auditors, they take on too much authority. Or as I have heard: Fire is a good servant but a terrible master. That was one of the features of the old RVA/SVA array and why I wish IBM would have followed through on the ?rumor? that the XIV was going to have FICON and CKD emulation added to it. The scatter loading of data allowed for very fast rebuilds of failed HDAs to minimize the potential for a second HDA failing taking either the entire array or a cluster of disks out. Alas it didn't happen, Rex The information contained in this message is confidential, protected from disclosure and may be legally privileged. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any disclosure, distribution, copying, or any action taken or action omitted in reliance on it, is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by replying to this message and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
