Warning: kind of a long-ish reply coming up;

> What exactly is wrong with the ECB/WAIT/POST mechanism? I think it has
> always worked great. It's got some pretty difficult limitations in
cross
> memory mode (but it still works!), but otherwise I'm not sure where
you
> get "awful".

Yeah, it works FSVO "works" but it is a 1960's design where the only
real concern was a wait and a post without much real asynchronous
behavior going on. Check the SVC numbers (1 and 2) Pretty early on in
that design cycle wouldn't you say? Back then nobody thought of or even
cared about all of the failure cases and there are loads of them. Here
are only a few of them.

Where do I start... how about the fact that an "ECB" is just 4 bytes (on
a word boundary) of storage? There is no indication anywhere that those
particular 4 bytes are really part of a serialization interface and not
(say) 4 bytes in someone else's control block or working storage. 

When you issue a WAIT macro Mr Wait only checks that each ECB is in the
right key for you to wait on it and that it does not appear to have
already been "posted". The big scientific check is whether the 40 bit is
on. If so, he's going to just return immediately. That's why the famous
"quick post" algorithm works. 

Otherwise WAIT is going to put the caller's RB into a wait state and
blast X'80abcdefg' into the ECB where "abcdefg" is (or would be) the
address of the waiting RB. Of course if that address wasn't really an
ECB you have an overlay. In most (but not all) cases that would be
considered a programming error so let's not quibble over it.

Now if the ECB happens to be in storage that the caller didn't own (say
it's owned by a parallel task) and the storage owner terminates. Oops
again. Now your task is waiting on an ECB that doesn't even exist
anymore and no deity could wake it up. It is literally and permanently
toast and the only way to wake up the task is to detach it which tends
to be kind of draconian for the work that was supposed to be running on
that task. This is a fairly common error in multitasking apps where
parallel tasks serialize between each other via wait/post. 

Now let's turn our attention to POST. Folks who routinely look at dumps
(guilty yer 'onor) tend to recognize things that "look like" an ECB in
wait status because of the X'80abcdefg' pattern. But if an ECB isn't
currently in a wait, then it's just 4 bytes of storage and the contents
could be anything at all. Arguably an ECB is only really an ECB when it
is being waited on.

Now Mr POST isn't fussy. He's kind of a lounge lizard kind of guy. He
does a quick look and if the 80 bit is on, he heads off and does RB
validation and assuming it really is waiting and the current post would
satisfy the wait count, POST alters the RB status to indicate the RB is
now ready. But if the 80 bit is off... he assumes the "ECB" just is not
being waited on, so he blasts X'40xxyyzz' (where xxyyzz is the post code
you supplied) into the "ECB" - again, see the quick-post code to grasp
the deep intelligence that's used.

So if you point your POST macro at -any- 4 bytes (in your own key,
assuming you're not authorized, but otherwise just any old 4 bytes you
want to nuke) and good ol' Mr POST will cheerfully blast a X'40xxyyzz'
into it for you. No muss no fuss and absolutely no way to say "oops".
But wait, there's more.

Since there's literally no indication ANYWHERE that those 4 bytes are,
or ever were an ECB, they could legitimately be asynchronously posted by
some other unit of work long after the thrill is gone. So let's say that
"function X" has called some asynchronous service and passed the address
of 4 bytes of private storage as an ECB. But assume for grins and
giggles that the async service can end with or without posting that ECB
and/or that our "function X" can simply decide to bail out and not wait
on the ECB at all. 

What happens next and why should you care? Presumably that ECB is going
to get nailed sooner or later. So now there's a reasonable certainty
that 4 innocent bytes (probably belonging to some other "function Y" by
now) are going to get vaporized if/when that async service call
completes. Ooops. That's another very common cause of those mysterious
overlay problems.

And then there's our friend Mr Cross-Memory Post. He's been the cause of
many a lost system. Now you're not just pointing at 4 random bytes in
your own address space, you have a whole system full of potential victim
address spaces. That's what the old TSO version of the battleships game
did BTW. You could nuke 4 bytes at a time in the other guy's address
space until either of you got forced off, or the system went belly up.
Sysprogs had fun back in the day huh?

The last point is that that there's no accountability. You could issue
POST against an "ECB" a hundred times and the owner of the ECB might
perceive any number from zero to a hundred posts. You would need to
build some sort of queuing mechanism in conjunction with wait/post to
ensure that both sides saw each "event" even if they don't agree on the
number of times POST has been done. And while wait/post are part of the
operating system, there's no standard way of doing that queuing, so
everyone invents their own on a case by case basis. And (surprise) they
often get it wrong. 

So when you lift the lid, you find it's a turd of an interface. Really.

CC

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to