Good questions, if you think of any more... :) Answers below...
On Oct 30, 2009, at 1:07 AM, Mike Walter wrote:
The most obvious, but least likely are the simplest to detect.
Sorry, I
have no answers -- just diagnostic questions...
1) Is it possible that there were any minidisk definition overlaps?
Nope, when I make changes to add (or remove) DASD, I always compare
the diskmap reports
before IPLing the instance. In fact, here is the process I use. Note
that I manually maintain the
USER DIRECT.
Identify the DASD to use.
If new, add to ALLOC. Always format the volume with CMS. If I don't,
it is possible
LVM will detect previous volume signatures and get stupid. :)
Add the DASD to the guest, always adding it is with higher, sequential
virtual device numbers.
IPL the instance, activate and format the disks from Linux side. I
usually use YAST to do this.
mkinitrd, zipl, re-IPL.
It was at this point that LVM went stupid and starting complaining
that volumes could not be found.
Linux, at this point, rearranged the volume assignments for some
reason. Don't have a clue why.
Brought down system, removed additional DASD, re-ipled. Still stupid.
Rand mkinitrd, zipl, re-ipled.
All was okay at this point.
Tried again with just one brand new never used 3390-9 volume. Created
a new LVM to add it to,
added it. re-IPLed....
LVM went stupid again. This time I could not recover all the existing
LVMs.
That is proabably more detail than anyone is interested in. But to
prove this out, I restored the entire image
from tape, recreated all the LVMs exactly as they were, repeated the
add one volume, and yep -
it got stupid again. Lost a different LVM, but still.
The only think I can think of is that I breached some kind of limit
with the number of volumes you can
attah to a single zLinux instance.
Do
you a directory management product to guard against that? If so, are
there
any trap doors left open (e.g. full extent minidisks that could have
been
linked R/W by another)? If not, does DISKMAP or DIRMAP report any
OVERLAPs?
There were no overlaps or other signals from z/VM that something was
not okay. :)
2) Could an external or guest system have written to the disks (i.e
another z/VM system, z/OS system etc. sharing the same set of DASD)?
Nope- just one LPAR up with z/VM. No other zLinux instances sharing
the disks, etc.
3) Could another user have linked to one or more of the minidisks R/W?
4) Were here any hardware log errors?
I was the only z/VM interactive user. Other zLinux guests are logged
in and active,
but there is no overlap in DASD between the guests.
And just for fun, was it ONLY LVMs that were blow away, or was it
the 200
(IPL) and 201 (/usr or swap) disks, too?
Yep, only the LVMs were damaged. And all of the LVMs were affected.
BTW, from another listserver discussion: 'Linux' is a trademark or
Linus
Torvalds. It is not supposed to be prefaced or appended by anything.
z/Linux, and zLinux violate the trademark. 'Linux for System z',
while
verbose does not. We should respect his trademark when using Linux
publically. I *believe* that IBM may own the trademark for z/
anything
(not 100% certain on that).
I had not heard of that. I think it is a little paranoid to worry
about it,
but I will try to pay attention to it, since obviously it is important
to
some folks.
Mike Walter
Hewitt Associates
The opinions expressed herein are mine alone, not my employer's.
"Paul Raulerson" <[email protected]>
Sent by: "The IBM z/VM Operating System" <[email protected]>
10/30/2009 12:23 AM
Please respond to
"The IBM z/VM Operating System" <[email protected]>
To
[email protected]
cc
Subject
z/LInux, LVM, and minor disasters.
I had to add some additional DASD to a Linux instance over the
weekend, and for whatever reason, it turned into a disaster.
Linux somehow or another decided to rearrange all the DASD and blew
every single LVM I had on the machine. Just under half a terabyte of
data went into some unrecoverable mode.
I'm pretty careful about this, and just build ~100GB volumes using
3390-9 volumes. The mini-disk definitions are always ordered and new
DASD gets higher, sequential numbering. I always start my minidisk
assignments at 200, so 200 is the IPL disk, 201 is either /usr or
swap, depending. 202 starts the data volumes.
I wound up building a new instance, and installing a slightly updated
version of Linux, then restoring all the data with Tivoli from a VTL.
There was no permanent damage done, except to my ability to sleep at
night.
When I rebuilt the data partitions, I built them as RAID-0 partitions.
(The Shark takes care of the real RAID, and I am not worried about
data loss from that direction.)
Has anyone else ran into this?
-Paul
The information contained in this e-mail and any accompanying
documents may contain information that is confidential or otherwise
protected from disclosure. If you are not the intended recipient of
this message, or if this message has been addressed to you in error,
please immediately alert the sender by reply e-mail and then delete
this message, including any attachments. Any dissemination,
distribution or other use of the contents of this message by anyone
other than the intended recipient is strictly prohibited. All
messages sent to and from this e-mail address may be monitored as
permitted by applicable law and regulations to ensure compliance
with our internal policies and to protect our business. E-mails are
not secure and cannot be guaranteed to be error free as they can be
intercepted, amended, lost or destroyed, or contain viruses. You are
deemed to have accepted these risks if you communicate with us by e-
mail.