|
At predefined intervals, the operating system checks devices
of a specific
type to determine if expected I/O interrupts have occurred. If an
expected
interrupt has not occurred across two of these checks, that interrupt
is considered
missing. The operating system then issues message IOS071I or IOS076E,
writes
a logrec data set error record, and tries to correct the problem. For
recurring
missing interrupts, the operating system issues message IOS075E
together with
message IOS076E or IOS077E to indicate the recurring condition on a
particular
device.
A feature of the IBM® 3990-6 and 9340 attached devices allows
MVS/ESA™ to automatically
identify a system in a multisystem environment that is holding a
reserve.
After every start pending MIH condition, the system attempts to
determine
whether the device is not responding because of a reserve to another
system.
If the device is reserved to another system, message IOS431I is issued
to
identify the system by its central processor serial number. If the
system
holding the reserve is a member of the same sysplex as the system
detecting
the MIH condition, message IOS431I includes the system name and the
LPAR ID,
if there is one.
For JES2 systems, when the reserve is held by a system in the
same sysplex,
the system attempts to obtain information about the job causing the
reserve
by routing a D GRS,DEV=devnum command to that system. JES2 systems
which have
JES3 installed must have JES2 started with the NOJES3 option
(CON=(xx,NOJES3)
in order to identify the job holding the reserve. Message ISG020I
identifies
the jobs holding the reserve on the failing system. The installation
can
use this information to determine what to do.
Some causes of missing interrupts are:
- An idle unit control block (UCB) with I/O requests queued
to it
- An outstanding I/O operation that should have completed
- An outstanding mount for a tape or disk
The intervals used by the operating system to determine
whether an expected
interrupt is missing varies from 15 seconds for DASD to 12 minutes for
3330
Disk Storage. An installation can define in the IECIOSxx parmlib member
the
time intervals for all devices in the I/O configuration. These
intervals
override the IBM-supplied defaults.
Notes:
- During IOS recovery processing, the system will override
your time interval
specification and may issue MIH messages and MIH logrec error records
at this
IOS determined interval.
- During IPL (if the device is defined to be ONLINE) or
during the VARY
ONLINE process, some devices may present their own MIH timeout values,
via
the primary/secondary MIH timing enhancement, contained in the
self-describing
data for the device. The primary MIH timeout value is used fo rmost I/O
commands;
however, the secondary MIH timeout value may be used for special
operations
such as long-busy conditions forlong running I/O operations. Any time a
user
specifically sets a device or device class to have an MIH timeout value
that
is different from the IBM-supplied default for the device class, the
value
will override the device-established primary MIH time value. This
implies
that if an MIH time value that is equal to the MIH default for the
defice
class is explicitly requested, IOS will not override
the device-established primary MIH time value. To override the
device-established
primary MIH time value, you must explicitly set a time value that is
not equal
to the MIH default for the device class.
Note that overriding the device-supplied
primary MIH timeout value may adversely affect MIH recovery processing
for
the device or device class.
Please refer to the specific device's reference
documentation to determine if the device supports self-describing MIH
time
values.
Note:
If there are missing interrupts on the
devices that contain
the system residence (SYSRES) or the page volumes, the operator may not
receive
any message, because the needed operating system routines are pageable.
The
operator can learn about the missing interrupts by initiating restart
reason
1.
|