As of May 3, APAR OA20907 was opened by IBM to provide a temporary fix
to the DFSMSdss RESTORE problem in the form of ADRDSSU patch byte that
can be set by an installation or on a specific invocation of ADRDSSU to
inhibit the reset of the DS1DSCHA bit on "RESTORE FULL" or "RESTORE
TRACKS". When available, this should provide a practical workaround for
us and others similarly exposed on DFSMSdss RESTORE.
For the long-term solution, IBM is considering as a possible future
enhancement something along the lines of, or a variant of, the proposed
change in default behavior for DFSMSdss as described in SHARE
requirement SSMVSS07009 (currently "open for discussion" on the SHARE
site) and a mirrored copy of that requirement as IBM Marketing Request
MR0409076057. This future enhancement would possibly include a change
in default RESTORE behavior based on use of "RESET" in DUMP, with a new
RESTORE option to allow for overriding default behavior.
This has been a topic of continuing discussion for the last month in
off-group discussions among IBM and a number of other people
representing various SHARE-member installations. With an accepted APAR,
it looks like we will get a circumvention in a timely fashion, and most
likely there will be some long-term solution that will eliminate the
chance that others will get bit in the future.
I again want to thank John Chase for alerting ibm-main to this problem,
the various SHARE officers who were instrumental in starting the
off-group direct contacts with IBM, and Andrew Wilt of IBM for the
progress toward resolving this issue.
Joel C. Ewing
Sr. Technical Admin, Mainframe Systems
Data-Tronics Corp., Fort Smith, AR
Joel C. Ewing wrote:
After four days of experimenting with DSS and thinking about the
implications of DOC APAR OA20117 I felt it time to share some additional
results and thoughts on this with IBM-MAIN.
First of all, let me re-iterate the basic exposure implied by OA20117:
If you are using DFSMSdss "DUMP FULL" without the "RESET" option --
which is the default usage, indicating the dump is not intended as a
replacement for individual dataset dumps -- to save the image of a DASD
volume and expecting at some point to use this dump with a "RESTORE
FULL" to move the volume to another DASD drive, as part of a Data Center
move or migration to new equipment, or for Data Center recovery at a
remote site, THEN MOST LIKELY YOU ARE CURRENTLY EXPOSED TO SOME FORM OF
DATA-LOSS!
This is true if you are using DFSMShsm with auto-backup enabled (many
sites), if you have DFSMShsm FSM (Fast Subsequent Migration) enabled
(fewer sites), if you have applications using DFSMSdss that use
"BY((DSCHA,EQ,YES))" as part of the selection criteria for data set
manipulation, or if you have any other vendor products or home-grown
applications in house that manage datasets or process datasets based on
the "Changed" bit in the VTOC. If any of these apply to your
installation, YOU ARE EXPOSED.
The crux of the problem is that the only practical way to make a
physical copy of a volume with DFSMSdss for moving the volume or
recovering it elsewhere is with "DUMP FULL" physical dump, and if this
is recovered to another device with the obvious counterpart "RESTORE
FULL", the result is currently not an identical volume, but a volume
with all the VTOC "changed" bits on the volume reset.
This means that future decisions on the recovered system or moved volume
that are based on the changed bit will be in error. The effects range
from failure to take a required auto-backup (exposing users to data loss
when a dataset recovery point they expect to be there is not), to DFHSM
erroneously assuming a down-level ML2 version of a dataset is current
and scratching the most current version on primary DASD (exposing those
using DFSMShsm Fast Subsequent Migration to unpredictable data loss), or
failure at some unknown time in the future to select for processing some
datasets that should be selected by either 3rd-party vendor products or
in-house applications that rely on the "changed" bit. These effects are
subtle. Unless some user notices and reports a problem, they can easily
be over looked; and if they aren't noticed until six months after the
RESTORE, there is little likelihood that DFSMSdss would have been
suspected over the more common possibility of "fuzzy user memory" of
some kind.
There are two pending requests for a change to this behavior: SHARE
request SSMVSS07002, which asks for changing the RESTORE default to not
clear the "changed" bits; and Marketing Request MR0302074136, requesting
an option on "RESTORE FULL" to control the handling of the "changed"
bit. After considerable thought I don't believe either of these is the
cleanest or most correct solution. The most consistent solution should
be based on the principle that at the completion of a physical volume
dump, if you immediately restore that physical dump onto the same device
or onto a different device, the default behavior should be that all
in-use tracks on the target device, including the VTOC, should be
identical with those left on the source device. In other words, an
immediate restore on top of the device used to generate the dump
shouldn't change anything. This means that the treatment given the
"changed" bits should depend on whether "RESET" (which resets all
"changed" bits at the end of creating the dump) was specified on the
"DUMP FULL". If the dump was created with "DUMP FULL ... RESET", then
"RESTORE FULL" should clear all "changed" bits in the VTOC. If the dump
was created with "DUMP FULL" without the reset option, then by default
"RESTORE FULL" should leave all "changed" bits in the VTOC unaltered.
This does not preclude the possibility of also adding a RESTORE option
to change this behavior, but my point is that for "least astonishment"
the default behavior should be based on whether the DUMP was with or
without "RESET".
I have opened an PMR with IBM arguing these points with IBM. The
initial response (as one might suspect) is that since the current
behavior has been present in DFSMSdss for a l-o-n-g time (despite the
fact that no one knew about it until the March 26 DOCS APAR, the end
result of John Chase's data-loss PMR), that IBM is disinclined to change
it unless there is a clear consensus within the DFSMSdss customer base
for a specific "design change" or "enhancement request". Our
installation's position is that since our management now knows we have a
data-integrity hole from using DFSMSdss in our Disaster Recovery design,
they aren't going to be willing to accept as a solution a program
enhancement process that may still not be resolved for more than 6 or 12
months, or potentially longer. I plan to create another SHARE request
in the near future with my modified proposal just to cover all bases and
give SHARE installations another place to cast a vote; but the bottom
line is that if you feel as I do that this ought to be handled as a
data-loss APAR and resolved sooner than a future enhancement, your
installation may need to rattle IBM's cages more directly to make it
clear this is not a request coming from just one or two installations.
ONE OTHER COMMENT. If your installation or any application you know of
actually depends on the bizarre and prior-to-March-26-undocumented
behavior of "RESTORE FULL" resetting the changed bits when restoring
from a "DUMP FULL" without "RESET" option, let this group and IBM know.
IBM believes you exist, but I have strong doubts.
Now, for some of the more bizarre (and imperfect) circumventions that I
have discovered that could be put in place pending a fix to DFSMSdss (if
and only if you have no other recourse):
One of the effects that OA20117 doesn't explain is that a RESTORE TRACKS
of the VTOC from a "DUMP FULL" dump file, still resets the "changed"
bits even though no datasets tracks are restored. However, if you
RESTORE TRACKS of the VTOC while restoring no other dataset tracks from
a "DUMP TRACKS" that includes the VTOC tracks, the "changed" bits come
through unaltered. Another oddity is that although "RESTORE TRACKS" of
the VTOC from a "DUMP FULL" dump, clears the changed bits, the dump file
itself still apparently contains some indication of which datasets
originally had the changed bit on. If you do a "RESTORE
DS(BY((DSCHA,EQ,YES))).." from a "DUMP FULL" dump file, then it will
actually restore only those datasets that originally had the "changed"
bit set, and in agreement with OA20177, since this is a dataset restore,
the changed bit will then be left on in the VTOC for those datasets. The
above suggests the following two circumventions, each with their own
problems;
(1)Backup the volume with DUMP FULL; then first RESTORE the volume with
"RESTORE FULL" to pick up all the tracks, including VTOC and IPLTEXT on
the volume; Finally, restore just the changed datasets with "RESTORE
DS(BY((DSCHA,EQ,YES)))... REPLACE", excluding the VVDS and VTOCIX to
avoid "E" level errors, in order to correct the "changed" bits in the
VTOC for the datasets that previously had this bit on. The major
problem, performance. You have to read the dump file twice, and so
correct recovery could easily double your recovery time.
(2)Backup the volume with "DUMP FULL" immediately followed by "DUMP
TRACKS" to dump just the VTOC tracks to a separate dump file; restore
the volume with "RESTORE FULL" immediately followed by "RESTORE TRACKS"
to restore just the VTOC tracks. Problems: the volume-level enqueues
will be lost between the two DUMPS and the two RESTORES. If there is
any possible way something could slip in and do volume updates in these
two gaps you are potentially in deep doo-doo. The other problem is that
unless all your VTOC locations and sizes are identical and never change,
you will have to have some foolproof, automatic way of maintaining the
correct VTOC track extents in your "DUMP TRACKS" and "RESTORE TRACKS"
control statements, or again you could get very nasty results that might
not be found until you tried to recover unsuccessfully during DR.
An impractical approach attempted and dismissed:
Try to get a single dump file that could be restored with a single
restore without changing the "changed" bits in the VTOC. The only
possibilities that will handle all VTOC entries and IPLTEXT are RESTORE
FULL and RESTORE TRACKS. RESTORE FULL is only compatible with DUMP
FULL, and we already know that doesn't work. RESTORE TRACKS will work
with both DUMP FULL and DUMP TRACKS dump files, but we already know it
fails with DUMP FULL dump. DUMP TRACKS with the max track range would
work with a RESTORE TRACKS with max track range, but this will be much
more expensive in time and dump file space than DUMP FULL, because
residual data in all unused tracks will be dumped as well. One could
attempt to analyze the VTOC and dynamically generate the DUMP TRACKS
control statement to dump only used tracks, but that makes this into a
very difficult problem, and the tracks ranges, which will be different
every time the volume is be dumped, must be preserved along with the
dump file in order to specify the correct track ranges on the restore.
In addition, a RESTORE TRACKS that overwrites track 0 and the VTOC must
be targeted to a device that is pre-initialized to have the right volser
and a VTOC and VTOCIX identically sized and positioned, or you get nasty
messages about an unreadable volume with VTOC errors when a rebuild of
the VTOCIX is attempted. This approach has many more problems and risks
than either approach (1) or (2).
Chase, John wrote:
-----Original Message-----
From: IBM Mainframe Discussion List On Behalf Of Richards.Bob
Sent: Thursday, December 14, 2006 4:01 PM
No, and I just read it a few hours ago in an attempt to help you.
That DEFAULT behavior WAS NOT documented.
The subject DOC APAR closed today, 26 March.
I VIGOROUSLY RECOMMEND anybody who uses DFSMSdss full-volume or
track-range (aka "physical") DUMP / RESTORE for D/R, data movement, etc.
to READ AND UNDERSTAND this DOC APAR. You **ARE** at risk of losing
data or data integrity via DFSMSdss Full-volume or track-range (aka
"physical") DUMP / RESTORE if you have ANY OTHER PRODUCT that depends
upon the setting of the "change" bit in the Format-1 DSCB.
The DFSMSdss Level 2 rep also submitted Marketing Request #
MR0302074136, requesting that a "switch" (similar to the RESET keyword
on DUMP) be provided to allow the user to specify how DFSMSdss should
handle the "change" bit at RESTORE time. Those interested should add
their voices via appropriate channels.
SHARE members who have not done so, please vote on Requirement #
SSMVSS07002, which requests a design change to DFSMSdss' default
behavior on full-volume RESTORE.
-jc-
--
Joel C. Ewing, Fort Smith, AR [EMAIL PROTECTED]
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html