Re: HSM Missing Member from Recalled Dataset -- Update

Chase, John Tue, 27 Feb 2007 05:57:19 -0800

Background:  Over the Thanksgiving holiday weekend (U.S.) we relocated
our datacenter.  We moved the mainframes' DASD volumes via DFSMSdss DUMP
FULL at the old site, followed by RESTORE FULL at the new site.  On the
day the DUMP FULL of every volume was taken, a TSO user had recalled a
migrated PDS and created and saved a new member therein.  At the new
site, following the RESTORE FULL of every volume and after the TSO
user's PDS had once again been migrated by DFSMShsm, the user recalled
the PDS, and noticed the new member he had added was missing.  We had
previously configured DFSMShsm for "Fast Subsequent Migration" (FSM),
and we had specified this user's datasets as eligible for FSM.


Timeline:

Day 0 (Old site):  

1.  TSO user HRECALLs PDS on Development LPAR; creates and saves new
member.
2.  z/OS (1.5) on Development LPAR is shut down before DFSMShsm
automatic backups run, and DFSMSdss DUMP FULL is started for that LPAR's
DASD volumes from "sandbox" z/OS (1.5) LPAR in the Sysplex.  (Process
was repeated for the Production LPAR later the same day, including
suspension of DFSMShsm automatic backups.)

Day 1 (new site):  DFSMSdss RESTORE FULL is completed for all DASD
volumes belonging to Development and Production LPARs ("sandbox" had
been "moved" the previous week).

Days 1 - 3 (new site):  Verification / acceptance testing of the "move"
is completed.  The aforementioned TSO user did NOT participate in that
testing.

Day 4 (new site):  "Primary" retention period for the TSO user's PDS
expires. DFSMShsm decides that the PDS has not been changed since its
last-known (to DFSMShsm) backup, so performs FSM, "reconnecting" the PDS
to its previous (now "stale") ML1 copy.

Day 5 (new site, now "in production"):  TSO user again HRECALLs the PDS
and discovers the member added on Day 0 is missing.  We were able to
recover the "correct" copy of the PDS from the full-volume DUMP.

I opened a PMR with DFSMShsm Support to try to determine why the PDS at
the new site was missing the member added at the old site, and learned
(among other things) that DFSMShsm relied in part on the setting of the
"change" bit in the Format 1 DSCB to decide whether to "reconnect" (FSM)
a migrating dataset to its previous ML1 copy, or create a "fresh" ML1
copy.  We and they ran a few tests, and observed that the "change" bit
was "off" in the DSCB for the dataset on the "new" volume after the
DFSMSdss RESTORE FULL.  Indeed, we observed that **EVERY** "change" bit
in **EVERY** Format 1 DSCB on the target volume was "off" after a
RESTORE FULL, regardless their settings on the source volume.

The PMR was "handed off" to DFSMSdss Support, who later confirmed that
RESTORE FULL does indeed unconditionally "reset" **EVERY** "change" bit
on the target volume, and that it does so **by design** because the DUMP
FULL tape is "by definition, a backup" of each dataset on the source
volume.  I noted that this behavior is both counter-intuitive and
UNDOCUMENTED anywhere in the DFSMSdss literature, and requested that
they reconsider this design point, at least in regard to RESTORE FULL.
They offered instead to open a DOC APAR to fully document the RESTORE
FULL behavior, and take a Marketing Request to offer a keyword "switch"
on RESTORE FULL by which the end user could specify whether to "reset"
the "change" bits.

I've accepted both offers.  Additionally, I've initiated a SHARE
Requirement (SSMVSS07002, currently "open for voting") requesting the
original design change for RESTORE FULL along with full documentation of
the behavior regarding the "change" bits.  Please read, consider and
vote on SSMVSS07002.

    -jc-

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: HSM Missing Member from Recalled Dataset -- Update

Reply via email to