Thoughts interspersed below:

On Tue, 2 Oct 2007 18:24:14 +0000, John P Donnelly <[EMAIL PROTECTED]>
wrote:

>we just moved from V1R4 to V1R7 29SEP07
>we had a Test LPAR executing V1R7 and a Prod LPAR executing V1R4
>these two happily coexisted with GRS as PLEXCFG=MONOPLEX, and system
>logger files defined with PLEX5 and PLEX1
>
>with the Prod LPAR under V1R7, we IPLed the Test LPAR (V1R7)
>and this is the last thing displayed before the Test LPAR went dead and
>the Prod LPAR just waited while recovering
>
>ISG011I SYSTEM CPU5 - JOINING GRS COMPLEX
This message means that this system has detected that the system, named CPU5
is joining this GRS complex.  While this is happening, all global ENQ
activity (that is, ENQ/DEQ requests for SYSTEMS level resources) is
suspended until the JOIN completes.
                       
>*$HASP9201 JES2 MAIN TASK WAIT DETECTED AT ISGNLPA +0099DE 891   
> DURATION-000:00:12.97 PCE-CKPT     EXIT-NONE JOB ID-NONE        
>*$HASP9207 JES2 CHECKPOINT LOCK HELD 892                         
> DURATION-000:00:17.99    
Best guess (without viewing the code) is that the JES2 main task issued a
GRS request (an ENQ or DEQ) and is waiting (probably for the JOIN to
complete).  "ISG" is the module prefix for GRS component modules.

If you're good at dump reading, you could take a dump of JES2 and GRS and
probably figure out what the GRS request is and where it was requested from.   
>
>there was also some squawking about a CTC following the previous
>
>D GRS,SYSTEM                                                          
>IOS000I 030F,**,SIM,**,**06,,,,GRS                                    
>IEF196I IOS071I 030F,05,GRS, MISSING CHANNEL AND DEVICE END           
>IOS071I 030F,05,GRS, MISSING CHANNEL AND DEVICE END 924               
>ISG046E CTC 030F DISABLED DUE TO HARDWARE ERROR  CODE=05              
>VARY 030F,OFFLINE COMPONENT:SCSDS MODULE:ISGBTC PURPOSE:DISABLE CTC
Looks like some I/O error occurred on CTC 030F. 

   
>ISG022E SYSTEM CPU1 DISRUPTED GLOBAL RESOURCE SERIALIZATION DUE TO 929
>COMMUNICATION FAILURE - GLOBAL RESOURCE REQUESTORS WILL BE SUSPENDED
The I/O error causes GRS to go into ring recovery.  Further delays in global
processing will occur.  Recovery of the members will be governed by your
GRSCNF parameters 
  
>ISG047I CTC 030F DISABLED
Yes, this is bad news from the GRS perspective.  In a non-sysplex GRS ring,
GRS counts on the CTCs to communicate between members of the GRS complex. 
Do you have alternate CTCs defined to handle soft or hard failures?
                                             
>
>but we really do not think a problem exists with the CTC, rather a
>definition is incorrect
Could be true, however that distinction is not interesting to GRS.  It
attempted I/O down the CTC and it didn't work.
 
>
>CA-MIM is also in the mix
Although related, it is not likely that MIM is doing anything to keep GRS
from communicating down the CTC.  

Question:  Why are you using both MIM and GRS?

>
>thoughts?             
Investigate why the CTC failed to work properly.
                         
>
>John Donnelly
>z/OS Systems Services
>National Semiconductor
>Corporation
>2900 Semiconductor Drive
>Santa Clara, CA 95051
>PH: 408-721-5640
>Email: [EMAIL PROTECTED]

Scott Fagen
Enterprise Systems Management

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to