Re: Recommendations for RECOVERY options

Lizette Koehler Thu, 29 Dec 2016 06:43:18 -0800

As pointed out, there may be people taking year-end holidays.

Also, the list is not required to answer any question asked.  They can, when
they have time, provide guidance.


If your question is urgent, then the best option is to contact the vendor for
the equipment or software you are having issues with.  They can provide the
support needed.  They provide the support that the list cannot do since we do
not have access to your shop, your configuration, or your details.

The list is more like a group of people sitting around chatting about the
mainframe and supporting functions.  

I would suggest you may wish to start a problem ticket with your vendor(s) on
this.


The z13 on other lists have had issues - I would definitely start with IBM and
z13 support.  This may be another manifestation of microcode issues.


Lizette


> -----Original Message-----
> From: IBM Mainframe Discussion List [mailto:[email protected]] On
> Behalf Of James Peddycord
> Sent: Thursday, December 29, 2016 6:46 AM
> To: [email protected]
> Subject: Re: Recommendations for RECOVERY options
> 
> NTAC:3NS-20
> 
> Our SAN people said they saw no errors on the switch (do I believe this?
> IDK) , which is dedicated to the mainframe. The problem was a bad cable
> between the switch and the storage, which both mainframes share, so there was
> an issue on one CHPID on each system.
> 
> In this forum with so many people who have so many opinions, I can't believe
> that nobody is offering suggestions for the RECOVERY parameter when asked.
> 
> The default is:
> RECOVERY,PATH_SCOPE=DEVICE,PATH_INTERVAL=10,PATH_THRESHOLD=100
> This is what caused our pain.
> z/OS has to see 100 errors per minute for 10 minutes before taking the path
> off of a single device. This was happening to every device.
> 
> On our test system I started with:
> RECOVERY,PATH_SCOPE=CU,PATH_INTERVAL=5,PATH_THRESHOLD=50
> z/OS has to see 50 errors per minute for 5 minutes before taking the path off
> of every device in the LCU.
> 
> I was hoping to see some real world examples that work for others.
> 
> Jim
> 
> -----Original Message-----
> From: IBM Mainframe Discussion List [mailto:[email protected]] On
> Behalf Of Alan (GMAIL) Watthey
> Sent: Wednesday, December 28, 2016 11:55 PM
> To: [email protected]
> Subject: Re: Recommendations for RECOVERY options
> 
> Jim,
> 
> As the network guy who looks after the SAN I would not be expecting my z/OS
> guys to do anything in this situation.  In fact z/OS cannot see our whole SAN
> as I have other things on it (eg. ISLs, backend tapes).
> Fortunately, the Brocade switches are dedicated to the mainframes and devices
> used by the mainframes here.  The Brocade will see issues that it recovers
> from well before z/OS sees anything.
> 
> Of course, I don't know exactly what your problem was and what your Brocade
> would have seen but I'd suggest the bottleneckmon command to detect latency
> issues.  Running the porterrshow command from time to time would also give you
> a good idea as to whether your SFPs and fibres are performing as well as they
> should.  If you have the Fabric Vision/Watch licenses then you do fancy stuff
> like fence errant ports before they impact anything.
> 
> 
> Regards,
> Alan Watthey
> -----Original Message-----
> From: James Peddycord [mailto:[email protected]]
> Sent: 28 December 2016 4:56 pm
> Subject: Recommendations for RECOVERY options
> 
> NTAC:3NS-20
> We had a situation with a bad cable that resulted in a huge performance impact
> due to the default way that z/OS (we are at 1.13) handles error recovery on
> Ficon paths.
> The symptoms were many (thousands) of IOS050I messages in the task's joblog,
> followed by an IOS450E message, which took the path offline to a single
> device.
> This was happening for every device (around 3000) that the affected path was
> attached to.
> As soon as I saw the messages I configured the CHPID offline and the problem
> stopped.
> We have put in automation that will immediately configure a CHPID offline as
> soon as a single IOS450E message is detected, and now I am experimenting with
> RECOVERY options.
> IBM recommended to set RECOVERY,PATH_SCOPE=CU, set the PATH_INTERVAL to
> 1 and leave PATH_THRESHOLD=10, and adjust from there.
> 
> Due to the paperwork involved with making any change in our environment, I
> would like to implement this with a minimum of 'adjustment'.
> 
> Does anyone have any recommendations?
> We are running on z13s, 16G Ficon through Brokade switches to IBM DS88xx DASD.
> 
> Thanks,
> Jim
> 

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Recommendations for RECOVERY options

Reply via email to