Re: Health Checker for z/OS checks - to check or not to check, now is the..

nitz-...@gmx.net Thu, 21 Mar 2013 23:14:05 -0700

> After using Health Checker for z/OS for several months now, I'm having
> doubts about some checks, so please feel free to comment :)
I just cannot resist.
But first: Have you checked the archives? Most of the checks you talk about 
here I have complained about in the past. Documented in the archives.


> Check: ASM_LOCAL_SLOT_USAGE
> We're using 4 page data sets in our test LPAR's, which are ~540k
> tracks. Adding additional is always an option, but is it worth it?
> Maybe to change the warning threshold? Thoughts?
The way I had set it up was to only let the check put out its message once and 
then only once per day for the life of the IPL. Several discussions on ibmmain 
have shown that the old 30% recommendation has never been withdrawn. I felt it 
important enough to have the check active. Eventually I ended up with a whole 
lot of 'storage' reporting that I used to fight for more real storage for the 
lpars that needed it.

> Check: GRS_CONVERT_RESERVES
> Comment: Still haven't used the GRS ENQ/DEQ/RESERVE Monitor to check
> what reserves our systems are using. I was wondering has anyone tried
> to convert all RESERVEs to ENQs? Were any applications problematic?
I deleted that check. We share two catalogs outside the sysplex, and in that 
case you *must not* convert reserves. In my opinion, since GRS *knows* that 
those catalogs are shared (after all, there is a certain construct detailed in 
the GRS planning manual to be put into the RNLs), this check should have the 
intelligence to turn itself off without customer intervention.

> Check: XCF_SIG_STR_SIZE
> Comment: This one just a raised a wonder in my head. It's not showing
> an exception rather a few warnings in the text of the check - that our
> signaling structures can't support the configuration specified with
> MAXSYSTEM. This is beacuse the sysplex where this occurs was big once,
> but now it's just a few LPARs. How could one reduce the MAXSYSTEM
> value while preserving other data in the CDSs? If I format a new CDS
> with a smaller MAXSYSTEM number, it won't play nice with the existing
> ones.
As Mike said (I am always glad when someone remembers my posts) there is NO WAY 
to decrease MAXSYSTEM dynamically, for integrity reasons. That is even 
described in one of the books. If you want to decrease MAXSYSTEM, you have to 
really cold-start the full sysplex on fresh couple data sets. Be aware though, 
that with a design change in z/OS 1.9 IBM now *requires* all  couple data sets 
in the sysplex to have the *same* value of MAXSYSTEM, as that no longer denotes 
the capacity of the sysplex but functions as an index. 

We cold-started on sysplex and CFRM CDSs twice a year, so other than 
discovering the hard way about that design change, I was able to run that 
check. If you're happy with your signalling structures, on the other hand, just 
delete the check.

The requirement Mike mentioned will NOT help you with this check, though. If it 
ever gets implemented. Even if you could clean up old systems from the CDSs, 
that would only address the indexing issue, but it would still not reduce 
MAXSYSTEM, which is what this check uses for computing the SIG_STR_SIZE on the 
assumption that you really want to have this number of systems in your plex.

> Check: IXGLOGR_ENTRYTHRESHOLD
> Comment: Another "regular" check that pops up. Almost always from the
> RRS MAIN logstream. Whenever I see this, I check the CF structure
> which always shows around 40-60% entries in use. Also, the structure
> is stuck at the inital size and AUTOALTER is disabled (by some
> recommendation, I think..). Will RRS structure auto expand if needed
> without z/OS (AUTOALT) intervention? Any best practices to recommend
> here? Also, sometimes when I check the "Count" column in the checks
> text, it's empty (no number is shown), but the exception is there...
> O.o ?
Disabling AUTOALTER for list structures was probably my recommendation: I had a 
case where AUTOALTER changed the ratio of entries-to-elements in a signalling 
structure (when one system was shutdown for immediate reIPL) so that at reIPL 
time my sysplex didn't have full connectivity anymore. I was fairly badly 
attacked by IBM and Cheryl Watson for this recommendation, Cheryl Watson not 
even citing the full reasoning when she cited. I still feel it is better not to 
have XES make my sysplex loose full connectivity so AUTOALTER was turned off.

Whenever I checked for stalled offloads (which is essentially what this LOGR 
check tries to warn about) the 'real mechanism' that does offload had kicked in 
and done the offload. Given that logger itself has a lot of 'red' messages 
warning about stalled offloads, I just deleted this check. 

> Check: RRS_MUROFFLOADSIZE
> This seems just fine, but I'm getting an error, not a warning that the
> change will be effective upon the new offload data set allocation.
> What's wrong here? (LOGR CDS FORMAT LEVEL: HBB6603)
No clue. I had deleted all of the RRS checks, as they appeared to me to be 
duplicates.

> Check: XCF_CDS_SPOF
> Comment: For this one, I just want to confirm my understanding of the
> messages. I'm seeing component indicators = 30000000_00000000_00000000
> on all the messages which cause this check to generate a high severity
> (by default) exception. By checking the DOC APAR OA28958, it seems
> that indicator is telling me that IBM thinks owning a single book in
> the CEC is a single point of failure (well, it is in some way, but...
> hmh?). Am I reading this right?
> Adding a book is defenetly not an option, so if that is the case, this
> one will need to stay disabled.
Yes, you are reading this right. That check is deleted in our setup, too. We 
are aware of it but don't have the hardware (and money) to accomodate the 
concept.

Best regards, Barbara

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Health Checker for z/OS checks - to check or not to check, now is the..

Reply via email to