Hi Timothy, You bring up some interesting points but I'd like to look at it from a different POV. Do most companies have a fully redundant hot-site where they can move a single piece of equipment (whether it be a mainframe or a midrange/Unix system or windows server) to the alternate site with the flip of a switch? While I have never had to partake in a real disaster, I've participated in tabletop DR scenarios and *always* one of the first things that has been discussed has been "What is the cost (in dollars as well as outage time/down time and people cost for the disruption) of moving to the DR site versus just biting the bullet and waiting for the primary site to come up?" Now combine that with the fact that WV is running apparently severely outdated equipment (self-inflicted wound here) supported by a third party vendor. Depending on how old the equipment actually is, would there even be a third party DR site that would be able to support their system?
It wasn't too many years ago that my site at the time had a DR contract with IBM BCRS. We were running an older mainframe that was still supported by IBM but when it came to doing a DR test we couldn't go to the BCRS site our contract was with (and we had a permanent network link to) because that particular BCRS site had no hardware old enough to support us. We had to go to a different BCRS site and IBM had to back-haul our network connection from their primary site to the one they put us in. My guess is that WV backed themselves into a corner by not allowing their IT staff the resources to keep their systems by woefully underfunding the capital needed to keep it running and now the "savior-de-jour" is coming in riding on a cloud and that's going to save the day for them. Rex -----Original Message----- From: IBM Mainframe Discussion List <[email protected]> On Behalf Of Timothy Sipples Sent: Wednesday, July 27, 2022 12:12 AM To: [email protected] Subject: [EXTERNAL] Re: Mainframe outage affecting W.Va. state agencies could take 48, 72 hours to resolve I have absolutely no information about this incident other than what the media are reporting. I wish everyone involved the best success. My *personal* curiosity revolves around the Disaster Recovery plan and resources. As I'm sure we all know the standard/typical operational practice is to have an alternate site, separated at some distance, equipped with standby resources. Disk subsystems replicate between sites (primary to alternate) either synchronously or asynchronously. Or at least there'd be a remote tape library, preferably virtual to some degree (for performance reasons), preferably with multiple incremental backups per day. If the primary site is lost, for whatever reason(s), the IT operations team restores at least critical services from the alternate site. It might be a long RTO (24 hours for example) if it's a basic/entry DR arrangement, but it'd be something. Over many years I've only ever worked with two clients that had no real DR plan and essentially no DR resources when I first met them. As it happens they were both government agencies, but they were also both located in fairly poor or poorer developing countries. One client took frequent tape backups and shuttled physical tapes off-site so at least they'd be able to recover to some point, eventually. (RTO="a week or two," RPO=12+ hours probably.) I wasn't happy they had to operate that way, but their constraints were genuine. I worked with the other government agency to eliminate their exposure within a tight budget, and they now have an alternate site with a reasonable DR capability. I also remember working with another customer in a developing country, a bank. They were upgrading their systems, and their original plan involved losing DR protections for a couple days (about 48 hours) as I recall. That plan troubled me, so I worked with them to create a better, safer plan that preserved DR coverage throughout the upgrade project. They chose the revised plan. They completed their upgrade project on-time, within budget, and without incident. So what happened to the alternate site (and DR switchover to it)? - - - - - Timothy Sipples Senior Architect Digital Assets, Industry Solutions, and Cybersecurity IBM zSystems/LinuxONE, Asia-Pacific [email protected] ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN ---------------------------------------------------------------------- The information contained in this message is confidential, protected from disclosure and may be legally privileged. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any disclosure, distribution, copying, or any action taken or action omitted in reliance on it, is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by replying to this message and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
