The plot thickens. Be sure to mention to management that you have a few
thousand of your closest friends with literally millennia of cumulative
experience working on the issue with you :-)

Of course, working with IBM is the better path, I think. Still, it might
be fun to try to recreate the problem by running and cancelling that
test job.  




-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On
Behalf Of JE Thinnes
Sent: Thursday, January 08, 2009 8:58 AM
To: [email protected]
Subject: Re: MVS 4 minute 'outage'

Thanks for all the suggestions.

Update.  We experienced another 'hang' lasting 2 minutes.  LOGREC
indicated 
SFTABN (S0222) on the same low priority batch job before both hangs.
The 
programmer indicated the job was doing many inserts/updates to various
DB2 
tables before it was canceled (this was on a test DB2 system).

We opened an incident with IBM - they indicated a DB2 roll back could
not 
have caused the hang of the Console address space.  

We are still persuing with IBM.

Are the any z/OS 1.9 uniprocessor configs out there?  Have any default 
settings changed in 1.9 that I should look at?  I am reviewing a
uniprocessor 
white paper from Dec 2006.  Any other ideas??



Answers to questions from other posters questions:

LOGREC did not indicated any ABEND071.

Environment.  2096-R01 (1CP with 1 zIIP and 2 IFL).  4 LPARs (2 z/OS 2 
LINUX).  Only 2 are active (the z/OS 1.9 in question and 1 test LINUX).
z/OS 
does not run under z/VM.

IBM reports no 'phone home' events.  No messages on the HMC. 

Problems happened at 10:47-10:51 and 13:28-13:30.

SMF/RMF INTVAL(30) SYNCVAL(30) at top and bottom of hour.  Nothing of 
consequence ended before the problem (except the DB2 test batch job 
mentioned above).

The highest priority batch job we have is IMP=4.

Not sure about the DASD caching stats in RMF.

The DASD is an EMC DMX4 (new as of Dec 08).  It is shared with the other

LPARs and some open systems.  The new DASD is connected via 4 FICON.  
This replaced a DMX2 connected by 24 ESCON.  The devices are mirrored to

an off site device.  However, there were no reported DASD hardware or 
communication problems.

No IPLs during the hangs.

No hit the stop button.

The new DMX4 has a 4 minute MIH setting.  The second problem was 2 
minutes long.

There were TCPIP error messages after the hang - they were attributed to

being the victim not the cause.

 

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html
NOTICE: This electronic mail message and any files transmitted with it are 
intended
exclusively for the individual or entity to which it is addressed. The message, 
together with any attachment, may contain confidential and/or privileged 
information.
Any unauthorized review, use, printing, saving, copying, disclosure or 
distribution 
is strictly prohibited. If you have received this message in error, please 
immediately advise the sender by reply email and delete all copies.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to