Dean Nai wrote:
Currently we backup everything on 3592 nightly and the next
morning they go off site for DR purposes, circa 1980’s
OK, so my guess was a good one then. :-)
So right now your "Recovery Point Objective" (RPO) is circa 36 hours,
probably higher. For example, if you start your backup at 10:00 p.m., the
tapes physically leave at 9:00 a.m., they arrive at your DR site at 10:00
a.m., and then the next batch arrives every day thereafter at 10:00 a.m.,
that's your 36 hours. If, organizationally, your 36 hour RPO is known,
documented, agreed, accepted, and periodically reviewed, great. Then your
goal is simply to get the best (lowest) RPO you can within the minimum
required budgetary and other limits that at least meet the 36 hour RPO.
Basically, you should be free and welcome to exceed the required 36 hour
RPO as long as you don't spend any extra taxpayer dollars to exceed it. :-)
If the current 36 hour RPO isn't acceptable, different story. It doesn't
seem like RPO=36h would be or should be acceptable, but let's assume for
now it's acceptable.
I'd like to comment first on some ideas Rex raises:
Rex Pommier wrote:
Since it's leased, I don't know if this would even be feasible:
sticking with the 3592-C07 controller but going to third party
maintenance. IDK how long parts will be available for them.
It's a fair question to research, as a contingency anyway. The DR site
operator may have a point of view, of course.
Will IBM offer some kind of extended (read: for an extra fee)
support for the C07 controllers? That'll depend on parts availability
as well. We got it for our C06s while we were migrating off the
local physical tape.
Also a fair question to research, as a contingency.
Another contingency possibility to research is "stockpiling." For example,
if you have one 3592-C07 controller per site, install a second one per site
(two more), procured from the secondary market. If one breaks and isn't
repairable, the other hopefully still works. I don't recommend this, and
it's not really a long-term solution.
Installing a small VTS at your primary site, and swinging the
(presumably) 3584 library behind the VTS. The VTS would then become
in essence a large cache for the tape library. This is one of the
options we looked at when our 3592-C06 controllers fell off maintenance
- because we couldn't find and C07s to replace them with.
Yes, the direct replacement is a "smart" controller, which is so smart it
isn't obligatory to equip it with physical tape libraries and tape drives.
Currently that's the IBM TS7770, which became generally available on
November 22, 2019. The TS7770 optionally, currently supports IBM TS1120
through TS1150 tape drives, depending on which suitably configured tape
library you have (TS3500 or TS4500). When attached to physical tape
libraries/drives it's known as the "TS7770T." The TS7760 is also currently
available.
I understand Dean has some concerns about direct replacement economics,
though. If the data volumes are quite low, then he could be right.
We ended up installing a larger VTS at the local site and a smaller
one at our secondary location, replicating between them, but we did
put a 3584 library on the back end of the remote one so we still have
data going to tape for long term archival.
Yes, that's quite common.
If we consider Dean's current DR strategy mostly in isolation, and meeting
or beating the current RPO, I'd like to nominate some possible parsimonious
options, in no particular order and often combinable:
1. Use a combination of nearline disk (rather than virtual or physical
tape) as your backup target, SafeGuarded Copy (a feature available in IBM
DS8880 and DS8890 storage units), and Global Mirror. This IBM redbook
describes SafeGuarded Copy:
http://www.redbooks.ibm.com/redpapers/pdfs/redp5506.pdf
The primary considerations are whether you already have the storage (or
plan to this year) and whether you have the right network connectivity to
your DR site to support Global Mirror (versus the current "tape pickup
truck" connectivity).
2. Use cloud object storage along the lines I suggested. Your backup target
and recovery source are the public cloud (such as IBM Cloud Object
Storage), and you simply add the right software to z/OS to use cloud object
storage as if it were (virtual) tape, such as IBM Cloud Tape Connector for
z/OS.(*) As long as at least one cloud object storage pool with a good
backup is adequately reachable from the DR site when you need to recover,
and provided you have a minimum emergency z/OS image (including the cloud
object storage enabling software) for quick IPL from emergency media (DVD
with older HMCs, or USB media on newer) if you need it, you can restore
from the cloud backup to the last good backup point.
There are some variations, such as contracting with two cloud object
storage providers and running backups to both (just in case one is offline,
somebody forgot to renew the contract, or whatever), using one public
provider and one private provider (whatever your larger organization has
for cloud object storage, and they likely have something already), using
cloud object storage that your DR site operator provides as one of the
pools, and so forth.
3. Place one IBM TS7770 -- it could be without physical tape libraries and
tape drives -- at your DR site, and run your backups to that remote virtual
tape library. That gets your backup data off site right away. This too
requires sufficient network connectivity to your DR site, although it isn't
quite as demanding as Global Mirror.
There are some variations here, too. For example, some shops effectively
run a third "data vault" site. They place one TS7770 across campus in a
completely different building, with no machine or disk, and that's the
"arms length backup vault." They place another TS7770 at the DR site, the
"remote vault." The "arms length" TS7770 then replicates to the "remote"
TS7770.
(*) I'm aware that Broadcom (CA), Model9, and possibly Compuware
(Innovation Data Processing) also offer software products in this
particular market segment, and there might be others I'm not yet aware of.