Warning: kind of a long-ish reply coming up; > What exactly is wrong with the ECB/WAIT/POST mechanism? I think it has > always worked great. It's got some pretty difficult limitations in cross > memory mode (but it still works!), but otherwise I'm not sure where you > get "awful".
Yeah, it works FSVO "works" but it is a 1960's design where the only real concern was a wait and a post without much real asynchronous behavior going on. Check the SVC numbers (1 and 2) Pretty early on in that design cycle wouldn't you say? Back then nobody thought of or even cared about all of the failure cases and there are loads of them. Here are only a few of them. Where do I start... how about the fact that an "ECB" is just 4 bytes (on a word boundary) of storage? There is no indication anywhere that those particular 4 bytes are really part of a serialization interface and not (say) 4 bytes in someone else's control block or working storage. When you issue a WAIT macro Mr Wait only checks that each ECB is in the right key for you to wait on it and that it does not appear to have already been "posted". The big scientific check is whether the 40 bit is on. If so, he's going to just return immediately. That's why the famous "quick post" algorithm works. Otherwise WAIT is going to put the caller's RB into a wait state and blast X'80abcdefg' into the ECB where "abcdefg" is (or would be) the address of the waiting RB. Of course if that address wasn't really an ECB you have an overlay. In most (but not all) cases that would be considered a programming error so let's not quibble over it. Now if the ECB happens to be in storage that the caller didn't own (say it's owned by a parallel task) and the storage owner terminates. Oops again. Now your task is waiting on an ECB that doesn't even exist anymore and no deity could wake it up. It is literally and permanently toast and the only way to wake up the task is to detach it which tends to be kind of draconian for the work that was supposed to be running on that task. This is a fairly common error in multitasking apps where parallel tasks serialize between each other via wait/post. Now let's turn our attention to POST. Folks who routinely look at dumps (guilty yer 'onor) tend to recognize things that "look like" an ECB in wait status because of the X'80abcdefg' pattern. But if an ECB isn't currently in a wait, then it's just 4 bytes of storage and the contents could be anything at all. Arguably an ECB is only really an ECB when it is being waited on. Now Mr POST isn't fussy. He's kind of a lounge lizard kind of guy. He does a quick look and if the 80 bit is on, he heads off and does RB validation and assuming it really is waiting and the current post would satisfy the wait count, POST alters the RB status to indicate the RB is now ready. But if the 80 bit is off... he assumes the "ECB" just is not being waited on, so he blasts X'40xxyyzz' (where xxyyzz is the post code you supplied) into the "ECB" - again, see the quick-post code to grasp the deep intelligence that's used. So if you point your POST macro at -any- 4 bytes (in your own key, assuming you're not authorized, but otherwise just any old 4 bytes you want to nuke) and good ol' Mr POST will cheerfully blast a X'40xxyyzz' into it for you. No muss no fuss and absolutely no way to say "oops". But wait, there's more. Since there's literally no indication ANYWHERE that those 4 bytes are, or ever were an ECB, they could legitimately be asynchronously posted by some other unit of work long after the thrill is gone. So let's say that "function X" has called some asynchronous service and passed the address of 4 bytes of private storage as an ECB. But assume for grins and giggles that the async service can end with or without posting that ECB and/or that our "function X" can simply decide to bail out and not wait on the ECB at all. What happens next and why should you care? Presumably that ECB is going to get nailed sooner or later. So now there's a reasonable certainty that 4 innocent bytes (probably belonging to some other "function Y" by now) are going to get vaporized if/when that async service call completes. Ooops. That's another very common cause of those mysterious overlay problems. And then there's our friend Mr Cross-Memory Post. He's been the cause of many a lost system. Now you're not just pointing at 4 random bytes in your own address space, you have a whole system full of potential victim address spaces. That's what the old TSO version of the battleships game did BTW. You could nuke 4 bytes at a time in the other guy's address space until either of you got forced off, or the system went belly up. Sysprogs had fun back in the day huh? The last point is that that there's no accountability. You could issue POST against an "ECB" a hundred times and the owner of the ECB might perceive any number from zero to a hundred posts. You would need to build some sort of queuing mechanism in conjunction with wait/post to ensure that both sides saw each "event" even if they don't agree on the number of times POST has been done. And while wait/post are part of the operating system, there's no standard way of doing that queuing, so everyone invents their own on a case by case basis. And (surprise) they often get it wrong. So when you lift the lid, you find it's a turd of an interface. Really. CC ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html

