Yes, there was a change in the state of the channel, it failed while the system was running. This caused a bit of havoc as the OPERATOR console and any other that issued any silly command like VARY PATH or VARY CHPID to hang. They were only freed by someone taking the CHPID off at the HMC. The second change of state occurred when IBM replaced a ficon card and returned the hardware. I presume that the proper return path was followed. I cannot be sure since the data center is over 1000 miles away. In any event, all 4K devices were sensed and made ONLINE to the system. The changing of state like that is not an every day occurrence. However, if the z9-z/VM system is to be a near-non-stop system, then it must withstand the possibilities of what may happen during normal operation. That includes the failure and replacement of a redundant part. In this case, there are four paths to the devices, so the failure and replacement of one of them falls into that category. And the component it was successfully replaced. It is just that the devices that were supposed to be offline were not following the repair. I would also suppose that making the path available via the HMC would cause a Configuration Change machine check that notifies CP that the change has occurred, and further, CP handles the interrupt in a reasonable manner. Regards, Richard Schuh
________________________________ From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Neubert, Kevin (DIS) Sent: Monday, December 24, 2007 12:51 PM To: [email protected] Subject: Re: Offline Devices Do you mean HMC? If so it appears to me this was caused by changing the state of the channel path via the HMC/SE while the OS was running-not recommended as the OS will not be notified. Regards, Kevin ________________________________ From: The IBM z/VM Operating System [mailto:[EMAIL PROTECTED] On Behalf Of Schuh, Richard Sent: Monday, December 24, 2007 10:27 AM To: [email protected] Subject: Offline Devices Recently, we have had some problems with a ficon card. The path was varied off from all devices and the chpid varied offline. The chpid was then taken offline at the HSM and the part replaced. When the chpid was brought back online at the HSM, the ensuing device reconfiguration interrupt apparently caused all 4096 devices to be brought online. Unfortunately, only 365 of the devices were online when the hardware activity started. The remaining devices are in the Devices_offline list in the SYSTEM CONFIG file and are not supposed to be online to VM. In this particular case, only the one path out of four was affected by the hardware problem. I cannot put the devices in an ignore list because it is sometimes necessary to make one of them available to VM. From the description, it does not appear that making them not_sensed would help. In fact, it would probably make the occasional need to make a device available more complicated. Is there any way to have the devices stay offline in a situation like this? Regards, Richard Schuh
