This is going to be a somewhat preliminary "feeler" post because we are not yet able to fully describe or recreate the bug we're seeing, but I'm hoping some of you have seen something similar.
We use Apache::Session::File as the storage module for our Apache::Session sessions. I have written an object (RMS::Session where RMS is our app) that basically is just a wrapper class for the Apache::Sessions. When I instantiate a new RMS::Session, it goes and ties to the actual Apache::Session, gets a hold of the session hash, populates it's member variables with values from the session hash, and unties/undefs the session hash. Thus we end up with a perl object representing our session with a friendly OO interface for our developers that they are used to, and the real session is freed for use by other requests. Everytime I instantiate a new RMS::Session, I timestamp the Apache::Session and I increment a 'retrievals' variable. Pretty much every request into our app needs to look at the session for something, so the end result is that sessions are being tied and written a lot. In some cases, a user will click into an area of our application that has say three frames, and the content of all three frames will go and look at the session, so three requests for the same session could come in at the same time, so it's probably exercising the locking mechanism fairly well. Here's the basic problem we're seeing...our sessions have a very well defined set of variables in them so the size of the session file is very predictable - in our case, they all are between 320-360 bytes at all times. What seems to be happening is that sometimes (more on this later) the files get written out in a corrupted state, and I've noticed it's a well-defined "corruption" to where the session file will shrink to a size of either 150 bytes or 63 bytes. Once this happens, the session is corrupted, in that I can no longer successfully retrieve any information from it. The session is still there, but the contents have been completely garbled. Unfortunately, it's neither predictable nor easy to reproduce. First, it only happens occasionally. we haven't yet found one set of actions that we can take and cause it to happen every time. One test we use to demonstrate it is to simply log in and out several times. Sometimes, 7 or 8 logins will go by without incident, and then the 9th will cause a corrupted session. Other times, 10 logins in a row will lead to a corrupted session. Secondly, it happens far more frequently on our production server than our development server (same exact code and versions of perl and all modules). I've begun to suspect that perhaps it only happens after a certain period of latency. Since our production server has a lot more data in it's database, operations tend to take much longer than they would during development. Perhaps this means that there's more opportunity in production for a request to ask for a session that's still held/locked by another child request. Like I said, it's still very preliminary. Anyway, my question for now is whether anyone has seen corruption like this with Apache::Session::File in your typical multi-user mod_perl web app environment? We're just trying to narrow down the possibilities since it's been two days of four engineers trying to come up with any sort of recipe for reliable reproduction or pattern to the bug with no luck so far. Thanks, Fran