Re: Tivoli Workshop Scheduler "Eats Lunch" Resolution

Susan Rice Mon, 18 Aug 2008 06:57:37 -0700

Some information posted for Art Eisenhour from Tivoli Advanced Support
Let me add some background for those not familiar with shared USS file
systems and provide guidance for this issue in the future.

TWS requires two directories in USS file systems, one for binary executable
code and one for work areas. It is recommended that these two directories
each reside in their own seperate file systems. This facilitates maintenance
of the code by allowing an unmount of the current level and a mount of the
new, and this facilitates reallocation of the workarea file system should it
be at risk of running out of space. As Jim points out, these file systems
should be owned by the z/OS Sysplex member where the TWS End-to-End server
is running, which must be the same system where the Controller is running.
In short, if the TWS Controller is moved or recovered to another Sysplex
member, the TWS file systems ownership should be transferred to the same
image. File system ownership can be moved automatically by z/OS through file
system automove definitions in the case where the owning system is shutdown
or fails. File system ownership can also be moved through the use of
automation or operator commands.

Automation can be used to move the file systems ownership as a part of a TWS
Controller move process or when a standby Controller becomes active.
Automation might trigger off of the Controller active message EQQN013I.
Alternatively, a prestart step in the End-to end server start process could
cause the move. Just so the move occurs before the End-to-end server is
started.

*Note that the ability to move the TWS started tasks to different Sysplex
members comes standard with the IBM Systems Automation for z/OS. A move of
file systems ownership could be added to SA definitions in the Prestart
Phase for the End-to-end server.*

For more information on managing USS file systems, reference publications,

   z/OS UNIX System Services Planning guide, GA22-7800-13
   z/OS UNIX System Services Command Reference, SA22-7802-09

"When mounting file systems in the sysplex, you can specify a prioritized
system list to indicate where the file system should or should not to moved
to when the owning system leaves the sysplex changes due to any of the
following:

   - A soft shutdown request has been issued.
   - Dead system takeover occurs (when a system leaves the sysplex without a
   prior soft shutdown).
   - A PFS terminates on the owning system.
   - A request to move ownership of the file system is issued."

Information specific to TWS use of the USS file systems can be found in the
publication, Tivoli Workload Scheduler for z/OS Installation Guide Version
8.3, SC32-1264-03, in the section, "Configuring for end-to-end scheduling in
a SYSPLEX environment".
"Having a shared HFS in a sysplex configuration means that all file systems
are available to all systems participating in the shared HFS support. With
the shared HFS support there is no I/O performance reduction for an HFS
read-only (R/O). However, the intersystem communication (XCF) required for
shared HFS may affect the response time on read/write (R/W) file systems
being shared in a sysplex. For example, assume that a user on system SYS1
issued a read request to a file system owned R/W on system SYS2. Using
shared HFS support, the read request message is sent via an XCF messaging
function. After SYS2 receives the message, it gathers the requested data
from the file and returns the data using the same request message.

In many cases, when accessing data on a system which owns a file system, the
file I/O time is only the path length to the buffer manager to retrieve the
data from the cache. On the contrary, file I/O to a shared HFS from a client
which does not own the mount, requires additional path length to be
considered, plus the time involved in the XCF messaging function. Increased
XCF message traffic is a factor which can contribute to performance
degradation. For this reason, it is recommended for system files to be owned
by the system where the end-to end server runs.

On z/OS systems, the shared ZFS capability is available: all file systems
that are mounted by a system participating in shared ZFS are available to
all participating systems. When allocating the work directory in a shared
ZFS you can decide to define it in a file system mounted under the
system-specific ZFS or in a file system mounted under the sysplex root. A
system-specific file system becomes unreachable if the system is not active.
To make good use of the takeover process, define the work directory in a
file system mounted under the sysplex root and defined as automove."

Susan
- Show quoted text -

On Tue, Aug 12, 2008 at 2:06 PM, Jim Marshall <[EMAIL PROTECTED]> wrote:

> Working with IBM Level 2 or maybe 3, we now understand what is causing the
> excessive CPU time being used by the Distributed component of Tivoli
> Workload Scheduler. I will review the scenario:
>
> Running a IBM 2096-O02 (36MSU) and 2096-T03 (95MSU) machines in a
> Parallel Sysplex where TWS runs on the "O02" system (smaller of the two).
> TWS is scheduling work in the Parallel Sysplex and also there is a
> distributed
> component for scheduling for 3-4 Windows Servers.  Historically it is
> interesting for TWS had its roots in an IBM product called OPC (Operator
> Control) which did z/OS and distributed scheduling using "Trackers". It
> worked
> very well using little CPU time.  OPC morfed itself into Tivoli and became
> TWS
> for z/OS and IBM bought a company called Maestro which did distributed
> scheduling. The two products were merged and Trackers went away. It took
> IBM a few years to fully integrate the two products. This brings it down to
> the
> present and performance issues encountered.
>
> TWS for z/OS runs separately from other Started Task for distributed TWS
> called TWSE2E.   TWSE2E was seen taking about 3 MSUs worth of the O02
> when the system used to run around 28-29 MSUs max in a month. IBM
> researched the issue and came forth with the explanation which is not
> highlighted in any of the Tivoli manuals as far as we can read. The TWSE2E
> executes its programs in the O02's USS system and has files defined in a
> zFS
> file system.  If indeed that zFS file system is not owned by the LPAR where
> TWS is running, all the I/O must go through XCF in the Parallel Sysplex;
> generating the extraordinary amounts of CPU time seen as being used by
> TWSE2E in that LPAR. The recommendation now is always have the zFS file
> system mounted to the LPAR where TWS is operating (otherwise TWSE2E will
> eat your lunch, dinner, etc). When we switched TWS's zFS file system back
> to
> the TWS LPAR, the CPU consumption dropped to almost nothing.
>
> I can understand the recommendation and now it places some considerations
> to ponder:
>
>    1. When a TWS LPAR is taken down the ownership of its zFS file system is
> automagically transferred to some other LPAR and it is not your choice
> which
> one (another interesting discussion could follow this line). So when the
> TWS
> LPAR is IPL'ed, operationally one must ensure the proper commands are
> issued
> to bring back ownership of TWS's zFS file system.
>
>    2. One can implement all of #1 in "Automation" if one is running some
> sort
> of automation package; a good case for getting one.
>
>    3. Keep in mind this is not a Parallel Sysplex problem but a zFS
> challenge.
>
>    4. I just have to wonder if all this is caused by I/O for TWSE2E having
> to
> go through XCF to get to the other LPAR where the zFS is owned, then why
> not the WAIT associated with I/O versus the heavy, heavy CPU load caused
> by this I/O (3-4 Windows Servers which get about 30-40 jobs per day)?
>
>        Note: I just have to believe there is more to the story and it may
> not be
> a TWS problem but maybe TWS exploiting something in USS and zFS which is
> a bad design.
>
> POSTSCRIPT:  Things are back using an acceptable amount of CPU and
> everyone is older and wiser.
>
> Jim
>
> P.S.  Wonder how many other z/OS USS implementations are using excessive
> CPU because of the ownership of some zFS file system. Will be on the watch
> for something like it in the future.
>
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
> Search the archives at http://bama.ua.edu/archives/ibm-main.html
>
>

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: Tivoli Workshop Scheduler "Eats Lunch" Resolution

Reply via email to