[CMS-PIPELINES] proposal for "plenum" - portable Pipelines for POSIX platforms

Rick Troth Sun, 19 Feb 2017 08:41:44 -0800

How many of us crave Pipelines when working with other systems?

I have seen at least two, probably three (maybe more) implementations of
Pipelines for systems other than CMS or MVS. Never could get into them
because they required heavy (IMO) supporting infrastructure. They're not
portable. They're not at all integrated with the systems they run on.
I've never seen a Pipelines implementation that could stand on its own.
(Other than the original, duh.)


But I believe it can be done.

What follows is a compromise. It doesn't have all the power of CMS/TSO
Pipelines, but should function well and (most important) _port widely
and easily_. I describe it in Unix/Linux terms, "POSIX" speak, and
envision a pure C implementation. This ports to any Unix and should
readily port to Windows too.
Here's the idea.

First, _control pipeline interpretation_. Get away from whatever
pipelining the target operating systems provide natively. (Get away from
shell pipes.) For Unix and Linux, define a 'pipe' verb which is external
to the command shell.

A POSIX Pipeline might look something like ...

    pipe '< fn1.ft1' \
        '| pad 80' \
        '| sort 1-40 a 41-80 d' \
        '> fn2.ft2'


The point is that the OS command interpreter does not interpret the
pipeline spec. In a shell, just quote the pipeline and you're golden.
The 'pipe' command then interprets the pipeline spec. Above necessarily
varies a bit from how we state things in REXX but is delightfully close.

The 'console' stage reads from stdin and writes to stdout providing a
bridge between shell pipes and he-man pipes.

The 'pipe' command spawns child processes, one for each stage.

Second, and this is central, _define connectors using a pair of
traditional Unix pipes_ per each. On their own, Unix pipes don't convey
record boundaries or flow control. But simply arrange for bi-directional
hand-shake and it works. One Unix pipe carries /data/ (and statistics)
/downstream/. The other conveys /flow control upstream/. The connector
is instantiated with file descriptors "read#1" and "write#2" handed over
to the producer and "read#2" and "write#1" handed over to the consumer.

    int fd1[2], fd2;
    rc = pipe(fd1); /* fd1[0] goes to producer, fd1[1] goes to consumer */
    rc = pipe(fd2); /* fd2[0] goes to consumer, fd2[1] goes to producer */


When the producer wants to write a record, he waits until the consumer
is ready. The consumer sends "STAT" upstream meaning "tell me what you
have". The producer then sends info about the record, how many bytes and
maybe other info. The consumer can allocate memory on demand to hold
records of any size. Consumer then sends "PEEK" or "READ" to which the
producer responds with the record contents. When the consumer decides he
has consumed the record, he says "NEXT" (or "DONE" or some such).

Underlying POSIX read() and write() functions on the file descriptors
are wrapped in ...

  * output()
  * readto()
  * peekto()


 ... which handle our "records". I've actually coded this much. It
works. So now I'm asking for help because I'm stuck.

Dispatching is left to the operating system kernel. Each stage is just a
process. I hear a lot of requests for threading. Threading is great! But
it requires a common address space. So I propose that we let the kernel
dispatch processes /just like it does already/. Using existing
infrastructure makes this implementation broadly portable.

Third, stages learn about their _initial connectors via the
environment_. The 'pipe' command interprets the pipeline, instantiates
the connectors, and spawns the stages. Each input connector is defined
by a pair of Unix file descriptors. Each output connector is defined by
a mirrored pair of Unix file descriptors. The file descriptors can be in
any order (they are usually not sequential). So the 'pipe' command puts
clues into reserved environment variables unique to each stage.

    PIPE_INPUT_1="5,6"
    PIPE_OUTPUT_1="9,8"


 ... or maybe ...

    PIPEFDS="*.INPUT.1:5,6;*.OUTPUT.1:9,8"


Either way, pipelines can be /named or enumerated/ just like with CMS
Pipelines.

Consumers and producers can push other stages onto their input or output
just like with CMS Pipelines. A new connector is created, and the
existing connector (along with the other side of the new connector) is
handed over to the added stage.

Bonus feature: _'__pipe__' for USS or OpenVM would punt to the real
thing_. Sure, it could do what's described above, but why pussy-foot
around when you've got the best right there under your Unix workalike?
So there would be a 'pipe' program for OpenVM (and USS) that behaves the
same as the POSIX Pipelines verb for the sake of shell scripts and
what-not. It would interpret Unix-flavor options and switches but then
hand off the rest of the work (including interpreting the actual
pipeline) to the CMS/TSO PIPE command.


There is a need.
Where I work, our architect is a former VMer. In recent email, he shared
a quick pipeline with the team and said this about it ...

    I generated this because I could feel my Pipes skills rusting,
    and that's _always A Bad Thing_.


Ya think?
There is a need.


Wondered about names for this thing: "ductwork" came to mind. But in US
English, "duct" and "duck" are too often mixed up and we don't want
people mixing up a powerful idea with "tape". (I cringe to hear the name
of "the handy man's secret weapon" butchered.) Or maybe "plenum", so
that's in the subject line of this note. Or possibly "waveguide"? (for
those of us who are into radio) Or maybe just call it "Power Pipes".

-- R; <><

[CMS-PIPELINES] proposal for "plenum" - portable Pipelines for POSIX platforms

Reply via email to