-----Original Message-----
From: IBM Mainframe Discussion List <[email protected]> On Behalf Of 
John McKown
Sent: Tuesday, July 7, 2020 8:58 AM
To: [email protected]
Subject: [External] Re: Storage & tape question

On Tue, Jul 7, 2020 at 8:19 AM Jackson, Rob <[email protected]>
wrote:

> Fun little note on RAID:  it is fallible.  The last Sunday of October 
> 2016 I got a call bright and early because our VTS (TS7740) had shut down.
> Turns out we had a "cache" HDD failure at around 4 AM, and then a 
> second one failed at around 7 AM, before the first one had been 
> rebuilt on a spare.  RAID-5 could not accommodate it.  Because of IBM 
> politics, we had no tape until Monday at 16:00.  I am ashamed to say 
> that I sort of took tape for granted.  It was astonishing how much of 
> our processing depended on it.
>

We had a similar problem occurs, long ago, with an actual SAN dasd array (for 
Windows, not MVS). Weekend backup to physical tape aborted on a Sunday. The 
Windows admin said "No problem, it's a RAID-5 array, I can fix it Monday 
morning." A few hours later, a disk in the array failed. No problem, right? 
Unfortunately, while the CE was on his way in to replace it, a second disk 
failed. The array was destroyed. Management said to repair it and reload from 
the Sunday backup and we'd be good. When the admin admitted that the backup 
failed and he didn't go in, he was immediately terminated. Now, what are the 
chances that 2 drives in an array will fail within hours? I don't know, but one 
thing many don't think about with a "new array" is that all the drives are 
likely the same age and will start to fail (if they are) about the same time.

IMO, given my paranoia, I firmly believe that the disks in an array should be 
replaced on a scheduled basis. I also believe in dual tape copies of important 
tapes. And also, that tapes in "long term" retention (we have tapes which have 
been at Iron Mountain for over 10 years!) should be brought in and the data 
copied to a new (not reused) tape annually. Of course, the bean counters will 
have an apoplectic fit and scream about how much it costs to do this. They only 
understand cost, not value. I consider them the bane of existence. Likely 
auditors, they take on too much authority. Or as I have heard: Fire is a good 
servant but a terrible master.



That was one of the features of the old RVA/SVA array and why I wish IBM would 
have followed through on the ?rumor? that the XIV was going to have FICON and 
CKD emulation added to it.  The scatter loading of data allowed for very fast 
rebuilds of failed HDAs to minimize the potential for a second HDA failing 
taking either the entire array or a cluster of disks out.  

Alas it didn't happen,

Rex

The information contained in this message is confidential, protected from 
disclosure and may be legally privileged.  If the reader of this message is not 
the intended recipient or an employee or agent responsible for delivering this 
message to the intended recipient, you are hereby notified that any disclosure, 
distribution, copying, or any action taken or action omitted in reliance on it, 
is strictly prohibited and may be unlawful.  If you have received this 
communication in error, please notify us immediately by replying to this 
message and destroy the material in its entirety, whether in electronic or hard 
copy format.  Thank you.


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to