No
VM:Secure. The problem occurred on a small, special purpose machine that is
accessed only via Secure TN3270. There are only 4 terminal addresses defined
and they are all logged on in a secure room. There are no network connections except
for an NJE link to our main VM system. It is used for submitting jobs and sending
files to the appropriate z/OS systems. It is one-way communication. There isn’t
any need for a heavyweight ESM.
This was a
new error to me. The normal sequence of an IPL is:
- Record
0/0/1 is read in to location 0. It contains the IPL PSW, IPL CCW1 and IPL
CCW2.
- A TIC
to IPL CCW1 is done.
- CCW1 reads
the first record of the initialization program.
- CCW2 is
executed. it either reads the rest of the IPL program or TICs to a
location in the record just read to load the rest of the init pgm.
- When
the channel program ends successfully, the IPL PSW is loaded and the init
pgm loads and starts the O/S.
The
allocation map is not used until after CP is started. We were failing when step
4 was supposed to occur. The first record of the init pgm was the one that was
corrupted. Record 4, where the allocation map lives, was not touched.
Is there
something in the system that arbitrarily rewrites these IPL records? How about
record 3, the volume label? The pseudo VTOC (records 5 and 6)?
As for the
VM:Secure thing, I find that simply documenting how the product can screw up a
system is not only an unacceptable answer, it borders on repugnance. A
strategic product that should always be working should never violate the
integrity of the system. There is no justification for it. The changed
allocation on the disk must be respected. Reallocating a disk is a normal
maintenance activity. If the documentation change is the answer, someone needs
to update the documentation for CPFMTXA/ICKDSF with a very stern warning about
the potential for disaster if VM:Secure is running when a disk is being
reallocated.
Would I be
presumptuous in thinking that ending VM:Secure before reallocating the disk
would be a sufficient precaution? Even that would be a problem for us. The
Rules Facility is heavily used in our environment.
Regards,
Richard Schuh
-----Original Message-----
From: The IBM z/VM Operating
System [mailto:[EMAIL PROTECTED]On
Behalf Of Mike Walter
Sent: Monday, October 30, 2006
1:02 PM
To: [email protected]
Subject: Re: Corrupted IPL Record
Were you running
VM:Secure on that system? Are the DRCT cylinders on that IPL DASD?
If so, this may help.
When VM:Secure
starts up, it reads the whole allocation bit map of the DASD with the source
directory minidisk (usually: VMSECURE 01B0). Each time VM:Secure rebuilds
the object directory cylinders (msg: "VMXRXB0740I The dynamic REBUILD has
completed. Directory maintenance activity will now resume.") it
completely re-writes ALL of the allocation bitmap (as it was when VM:Secure came
up) from its cached copy, except for updates to the bits in the DRCT cyls.
That bit us
(excuse the pun) a couple Sunday IPLs in a row when the newly expanded PARM
disk kept getting changed back to its old size. The back end of the newly
updated SYSTEM CONFIG happened to have 4K blocks allocated on the new
cylinders. When VM:Secure re-wrote the PARM allocation map size/location
(begin location did not change, just the end), it chopped off the last half or
so of the SYSTEM CONFIG. CP happily came up without error because the
truncation just happened to be between SYSTEM CONFIG statements. To
diagnose it, I wrapped SYSTEM CONFIG with confirmatory messages issued as it
runs:
Say
"Beginning: 'SYSTEM CONFIG' from MAINT's CF1 disk..."
TOLERATE_CONFIG_ERRORS NO
... rest of
statements...
Say
"Completed: 'SYSTEM CONFIG' from MAINT's CF1 disk."
Be careful of
"TOLERATE_CONFIG_ERRORS NO". Don't just drop it in without
trying it live. There are non-syntactical errors which will pass CPSYNTAX
(everyone DOES run CPSYNTAX **EVERY TIME** after changing SYSTEM CONFIG,
right?), but will cause an IPL error. In our case, the missing
'Completed' statement confirmed the suspicion. And... explained why the
system came up so half-configured (missing the last half or so of SYSTEM
CONFIG).
After reporting
it to CA, they said they would update the doc, showing how VM:Secure can cause
these sorts of problems.
By chance, I had
a conversation with the CPSYNTAX developer about an open PMR just last week.
I suggested some type of new statement pairs which, if present, must BOTH
be present as the first and last non-comment records in SYSTEM CONFIG (and
perhaps IMBED files) to diagnose just this sort of error.
Mike Walter
Hewitt Associates
Any opinions expressed herein are
mine alone and do not necessarily
represent the opinions or policies
of Hewitt Associates.
|
"Schuh,
Richard" <[EMAIL PROTECTED]>
Sent by: "The
IBM z/VM Operating System" <[email protected]>
10/30/2006 02:32 PM
|
|
If this shows up
twice, I apologize. I first sent it 2 hours ago and it hasn't hit the archives
as yet.
Over the weekend, we upgraded our final system to 5.2. As a part of the migration,
we changed old PARM extents to PERM and allocated new PARM extents. When we
tried to ipl the system, there were errors that led me to the conclusion that
the IPL program had been corrupted. Using DDR on another system, I observed
that record 0 0 2 had been completely wiped out. There were a few scattered
bits that were not 0 in the first 100-200 bytes, nothing that was any kind of
pattern, and the rest of the record was all 0. Records 1, and 3-6 were all as
they should have been. I had to run SALIPL to fix the IPL program.
I am not blaming CPFMTXA ALLOCATE because I think it highly unlikely that
it was the cause. I suspect that something else happened between the previous
IPL and yesterday's failed attempt. Has anyone else seen this kind of corruption
or are we, once again, unique?
Regards,
Richard Schuh
The information contained in this e-mail
and any accompanying documents may contain information that is confidential or
otherwise protected from disclosure. If you are not the intended recipient of
this message, or if this message has been addressed to you in error, please
immediately alert the sender by reply e-mail and then delete this message,
including any attachments. Any dissemination, distribution or other use of the
contents of this message by anyone other than the intended recipient is
strictly prohibited.