Yes, there was a change in the state of the channel, it failed while the
system was running. This caused a bit of havoc as the OPERATOR console
and any other that issued any silly command like VARY PATH or VARY CHPID
to hang. They were only freed by someone taking the CHPID off at the
HMC. The second change of state occurred when IBM replaced a ficon card
and returned the hardware. I presume that the proper return path was
followed. I cannot be sure since the data center is over 1000 miles
away. In any event, all 4K devices were sensed and made ONLINE to the
system. 
 
The changing of state like that is not an every day occurrence. However,
if the z9-z/VM system is to be a near-non-stop system, then it must
withstand the possibilities of what may happen during normal operation.
That includes the failure and replacement of a redundant part. In this
case, there are four paths to the devices, so the failure and
replacement of one of them falls into that category. And the component
it was successfully replaced. It is just that the devices that were
supposed to be offline were not following the repair.
 
I would also suppose that making the path available via the HMC would
cause a Configuration Change machine check that notifies CP that the
change has occurred, and further, CP handles the interrupt in a
reasonable manner.
 
Regards, 
Richard Schuh 

 

 

________________________________

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Neubert, Kevin (DIS)
Sent: Monday, December 24, 2007 12:51 PM
To: [email protected]
Subject: Re: Offline Devices



Do you mean HMC?  If so it appears to me this was caused by changing the
state of the channel path via the HMC/SE while the OS was running-not
recommended as the OS will not be notified.

 

Regards,

 

Kevin

 

________________________________

From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On
Behalf Of Schuh, Richard
Sent: Monday, December 24, 2007 10:27 AM
To: [email protected]
Subject: Offline Devices

 

Recently, we have had some problems with a ficon card. The path was
varied off from all devices and the chpid varied offline. The chpid was
then taken offline at the HSM and the part replaced. When the chpid was
brought back online at the HSM, the ensuing device reconfiguration
interrupt apparently caused all 4096 devices to be brought online.
Unfortunately, only 365 of the devices were online when the hardware
activity started. The remaining devices are in the Devices_offline list
in the SYSTEM CONFIG file and are not supposed to be online to VM. In
this particular case, only the one path out of four was affected by the
hardware problem. 


I cannot put the devices in an ignore list because it is sometimes
necessary to make one of them available to VM. From the description, it
does not appear that making them not_sensed would help. In fact, it
would probably make the occasional need to make a device available more
complicated. Is there any way to have the devices stay offline in a
situation like this? 

 

Regards,
Richard Schuh 

 

Reply via email to