Danny,
> In my opinion James socket problems would be greatly reduced in impact if > James behaviour was as follows.. > > connections are accepted > -> resources are consumed > -> limits are approached > -> connections are refused > -> resources are freed > -> connections are accepted Some of this capability is already present. It simply requires correct configuration. Use of the <connections> sub element <maxconnections> (newly introduced with the ConnectionManager change of a few weeks ago) allows you to throttle the number of connections per server connection. The current problem is that the mis-use of the Scheduler requires that the maxconnections number be kept artificially low. Basically, with only five concurrent connections, you can easily kill the Scheduler implementation with consistent load. > In addition it concerns me that we can't run James under the -server JVM > otpion on linux because Avalon causes a failure (attached message) > Tomcat 3 under heavy and sustained load ends up with an out of memory > exception, -server cures it, largely because of the more agressive garbage > collection. It concerns me too. We should push the Avalon folks to figure out what the problem is. Possibly this would fix the Scheduler crash, possibly not. Seems doubtful to me, as the problem results from the fact that the global scheduler or timer has references to events that have been expired and thus GC won't remove these events. As far as I can tell the exact same problem exists with Harmeet's scheduler as does with the previous scheduler. The priority queue will hold on to the events, causing out of memory errors. This is one reason why I believe the scheduler is the wrong approach. > In my opinion it is right for us to optimise our use of resources, but > impossible to create a server that will sustain any load applied, what we > need to do is ensure that the server will continue to function, even if > this means rejecting connections. > This route will provide a scalable and robust solution. I don't disagree with this point. And a correctly configured server (after the watchdog fix) does this properly. Specifically, each service requires a base number of threads (~2) to function. Each service requires either 1 or 2 threads per handler, depending on whether we're using the old code or the new code. The SpoolManager consumes num of spool threads plus one. The NNTP Repository consumes the number of spooler threads plus one. Fetchpop consumes a single thread. So sum that all up based on your configuration, and set that to the max of your thread pool. If you do, no problem. This is basically what I've been trying to work towards. Obviously James can't take arbitrarily high loads. But the current maximum load is well below what a real production system should be able to take. And the current response of the server in the case of overload is clearly not acceptable. Server needs to be robust. How do we solve this problem? Proper configuration and a source base that doesn't tip over from OutOfMemoryErrors. I believe the current patch helps alleviate this situation. I understand you're having issues, and can only tell you that I am not. I'm happy to work with you to get through those issues, but I need more info on your configuration and assembly. --Peter P.S.: The problem from last night's test has been identified. Basically the problem lay in the spool. The spool processing fell woefully behind the rate at which emails were coming in. This led to a multi-GB backlog in the spool of hundreds of thousands of files of ~1 K. This led to O/S level problems, as Win2k doesn't handle this very well. It's taken me well over an hour to attempt to delete these files, and I'm not done yet. But there is no indication of a problem with the handler. Just a problem with the underlying O/S. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
