Re: [CMS-PIPELINES] Trap error in stage STARMON

Rob van der Heij Fri, 08 Nov 2019 00:21:10 -0800

On Fri, 8 Nov 2019 at 01:54, van Sleeuwen, Berry <[email protected]>
wrote:


> Hi All,
>
> In the past when STARMON didn’t get all data in time it abended, and the
> entire pipeline with it, with an RC=313. I have coded the REXX exec to
> restart the MONWRITE machine when that happens.
>
> I now have the 313 error but the PIPELINE didn’t abend. In fact, the PIPE
> is still alive but STARMON doesn’t process any records after this event.
> Has this behavior changed in the newer PIPE version? I had been using the
> upstream runtime version in z/VM 5.4 and 6.3. Obviously in 6.4 and 7.1 I
> now use the IBM supplied version.
>
> FPLIUS313E IPRCODE Message was purged received on IUCV instruction.
> FPLMSG003I ... Issued from stage 1 of pipeline 5.
> FPLMSG001I ... Running "STARMON MONDCSS SHARED".
> FPLSMG313E IPRCODE 00000939 received on IUCV instruction.
> FPLMSG003I ... Issued from stage 1 of pipeline 5.
> FPLMSG001I ... Running "STARMON MONDCSS SHARED".
>
> What would be the best way to handle the abend? Is there a way to end the
> PIPE so that the REXX code can restart the PIPELINE?
>

Berry,

I normally see this when the pipeline processing the monitor records has
been held up long enough (like waiting on MORE ... ). You may be able to
avoid that by an ELASTIC after the STARMON stage (and the GATE to terminate
it). When STARMON terminates because the output was severed (or through the
immediate command, there shouldn't be any such messages).

This isn't an ABEND; it's just this particular pipeline stage terminate
because things didn't go as planned. As long as the rest of that pipeline
is properly done, it would also wind down and you end up in the REXX code
after the PIPE or CALLPIPE that did the pipe. You can check on the return
code and decide what to do.
You can even write your own REXX wrapper around STARMON that simply
restarts that pipeline on a return code 313 as long as the business logic
doesn't care when some data was late or lost.

You probably also had CP messages indicating that you didn't finish
consumption of the data in time. There are also CONFIG options to tell CP
how long to keep the data for you, depending on the size of the MONDCSS and
the configured size of the partitions in it. You could also look at the
amount of event records generated and see whether you want to disable a few
domains that you don't need.

There are some things "in the pipeline" to teach STARMON to write the
equivalent of MONWRITE data, which would let you also compress the data
before writing to disk. Along with that, I envision controls that let you
terminate STARMON after writing the complete block of sample records to
avoid incomplete data because STARMON terminated by a severed output stream.

Sir Rob the Plumber

Re: [CMS-PIPELINES] Trap error in stage STARMON

Reply via email to