I have PSM built and running on OpenVMS, but after viewing a couple of secure pages my PSM server crashes. I've tracked down what I believe is the source of the problem, and would like to know why this isn't a problem on other platforms (my only explanation is that since this is a timing related issue everyone else is just getting away with it).

The code in CMT_EventLoop basically does this:

    while (select(controlSock))  ! wait for a message
        CMT_ProcessEvent(controlSock);

The problem is that CMT_ProcessEvent is reading controlSock, and by some (necko?) means, we end up with another thread doing some of the reads on controlSock. The leaves the original thread free to come back around and, if its fast enough (or in my case if ProcessEvent is slow enough), detect the tail end of the previous message and incorrectly call ProcessEvent again to start processing it as a new message. As expected, all hell then breaks out.

There is some locking going on in ProcessEvent and other places, but it doesn't keep the lock until the current message is completely read and processed. If it did, then maybe I wouldn't be seeing this problem.

Is the timing really so much different on other platforms that they never run in to this problem? Or am I way off base here?????

Colin.

Reply via email to