----- Original Message ----- From: "Peter M. Goldstein" <[EMAIL PROTECTED]> > > Basically this code is designed to address a bunch of > > memory/robustness/scalability issues. These issues were, at least in > > part, due to problems with the Cornerstone TimeScheduler. Dependence on > > the global TimeScheduler, previously used to handle idle timeouts, has > > been removed. Instead, James now uses the classes of the new > > org.apache.james.util.Watchdog package to provide this functionality. > > I have been using a scheduler implementation for 1+ year without any > issues. > If you don't trust a scheduler implementation why not replace that with > another one ? At least one has already been posted. > One of the advantages of Avalon is that you can replace an impl. I thought > the scheduler is a good abstration and widely used. > > What are the advantages of picking another abstraction to solve the same > problem ?
Because the scheduler is the wrong abstraction. This is not a scheduler problem - it's a timeout. A timeout is far simpler than a scheduler. And that's the core issue. [Harmeet] One time timer task rather than repeated timer tasks is the same thing as Watchdog timeout. It is property of implementation. ------------------------------- A scheduler is required to keep track of a number of possible events, each of which may trigger at an arbitrary time. It is centralized, and has complex synchronization issues when large numbers of events are added or deleted frequently. Schedulers are generally intended for events that are long lived and are expected to actually occur. [Harmeet] Depends on how the scheduler is used. Centralized-complex etc. are properties of implementation. Schedulers can work like one-time alarm clocks. ------------------------------- A timeout is the opposite problem. The connections and the corresponding handlers are extremely short lived - on the average they live for milliseconds. Under load hundreds of events will be added or deleted per second. In the overwhelming majority of cases the timeout should not trigger. It is designed to handle the exceptional case, not the general one. [Harmeet] If you have an object that monitors timeout on lots of events, it would have the same property and load. It may be called a watchdog or a scheduler. You can have a scheduler implementation with the same properties as watchdog. ------------------------------- Thus a timeout should be cheap to invoke (lightweight, involving just a couple of small objects and drawing from an efficient thread pool) and it should be optimized for the general case (the timeout thread waits for the timeout period, rather than consuming resources by context switches and unnecessary loops). There should be minimal contention between different handlers, as their timeout events are independent. So rather than using a globally shared scheduler resource with frequent invocations, you grab a thread from a shared pool and then face no potential contention with other handlers for the life of the watchdog. The scheduler model fails on all of the above. [Harmeet] lightweight-effiicient etc. are properties of implementation. The scheduler abstraction cannot enforce or avoid any of these. The point is that it may be better to change the scheduler implementation and use(one scheduler per class of handlers) than to invent another mechanism. If another mechanism is needed, can it fit inside the same abstraction ? To me scheduler/Watchdog and JDK Timer Task are very similar. The difference is mostly in the implementation. If you have one scheduler implemented for a class of handlers, that would do the trick equally well. If you reset a scheduler before it is fired, the trigger would not occur. ------------------------------- > > > > In addition, there was substantial rearchitecting of the service > > structure. James services are now subclasses of the new > > AbstractJamesService class. This class serves to unify the common > > functionality of the services. BaseConnectionHandler is removed, > > because the functionality once provided by that class has been moved > > into AbstractJamesService. The Configurable interface (and the > > Earlier the intent was to have thin server and move towards a base handler > class. I don't see the benefit in rearchitecture. Did the current system > not > meet a requirement ? > In the past there has been conversation about removing derived Servers > classes and making do with handlers. There were a number of very serious problems with the handler heavy approach. First, there was the simple expense. Server objects are created once per service, handler objects are created, fed through the lifecycle, and destroyed multiple times a second. Thus it is advantageous to move as much of the lifecycle expense into the server code. Second, there's a bizarre division of configuration information. Not only was the configuration information divided between handler and server in some bizarre fashion (why does helloName belong to the handler and not to the server?), but common configuration between the servers was duplicated (and may I add, duplicated incorrectly, leading to yet another long standing NNTP bug). The addition of the AbstractJamesService class correctly handled that common server configuration (extracting socket type, port, enabled, etc. in a uniform fashion) while allowing custom per-server type configuration. The BaseConnectionHandler class attempted to encapsulate some of that common behavior, but by the very fact it was a handler class rather than a server class it couldn't encapsulate the port #, socket type, connection type, and thread pool used for connection. Those have to be in the class that provides the service. So why should the helloName and the timeout be in the BaseConnectionHandler as opposed to being in the aforementioned server class? [Harmeet] A Config object can take care of this. Handlers don't need to expose themselves to lifecycle. What you are describing is configuration per handler and as you pointed out that is not needed. You are also saying configuration is spread out, that can be centralized too in a config object. ------------------------------- Third, the sheer conceptual weight. For the reasons described above, the BaseConnectionHandler class cannot be the end-all/be-all class. Thus, why is it necessary? The answer - it isn't. It just further confuses the architecture. And it perpetuates the idea that you can mate any ConnectionHandler to any service. As that idea is false (see below), it just serves as a point of confusion in the code base. [Harmeet] Listener-dispactch(handler) is the pattern followed in current code. It is more flexible than server class hierarchy. The missing piece is confusing and repeated configuration of the handler not a problem with the pattern iteself. Most other servers try to separate the listeners from dispatch workers/handlers. ------------------------------- > > corresponding behavior) has been removed from all the ConnectionHandler > > children, in recognition that configuration is really on a per-server, > > not per-handler, basis. > > A far simpler solution would have been to have a single params object per > handler or optimize configuration code. This would have delivered the > speed > gain without a new architecture. Each philosophy would have holes and I > don't see big enough of a hole in the hander mechanism to require a redo. No, that wouldn't work. The solution described is not simple at all. Adding a "params" object would either require re-validation and defaulting in each handler invocation (resulting in the original expense), or would basically be a non-type safe (and hence inferior) version of the changes already introduced. The servers are different (and the handlers are not interchangeable) for precisely this reason. Different handlers require different param values/services. [Harmeet] Params/Config need not require revalidation and if they do there will be the overhead that you are avoiding by server class hierarchy. ------------------------------- As for optimizing the configuration code, there is nothing to optimize. The pre-existing Configuration code was basically as optimized as possible. It was just getting invoked hundreds of times per second rather than once per the lifetime of the server. There's no way to beat that math. [Harmeet] Cache the config. Precalculation for handlers, isn't that the only gain in having a server class hierarchy. Maybe that idea can be abstracted and used with current model. ------------------------------- > > This leads to tighter coupling between the > > Handlers and the corresponding server classes, but that's simply > > recognition of a pre-existing reality. These classes are extremely > > tightly coupled. > > Server is a listener. Handler contains protocol specific stuff. Why would > there be a (tight) coupling ? > It is a listen-dispatch mechanism. Shouldn't it be as loosely coupled as > possible ? This is a somewhat na�ve view of a service architecture. Servers aren't generic. If they were, we'd need only one server class period. But servers are different from one another, so we need separate server classes. You can see how the servers differ by examining the diffs for the assorted server classes that were submitted with this patch. [Harmeet] No I am saying listeners are generic. Handlers/Workers are specific. A mix of two makes service and servers. ------------------------------- In addition, a service is not simply the protocol. Specifically, a handler that implements the SMTP protocol does not an SMTP service make. There are two additional constraints. [Harmeet] But a socket listener associated with SMTP handler makes an SMTP Service. That is the current state. An incremental and better system would be Socket Listener with Config Object and Handler makes a Service. A Bad State would be a server class hierarchy that couples Listener, Config and Handler. ------------------------------- First, there are service specific details that are not described by the protocol. These generally must be accessed and invoked before any handler is created. The obvious example is the default port for the service. You need the default port when you're getting the config to open the socket. [Harmeet] Port, queus size etc are the only listener concerns. ------------------------------- Second, there are resources provided by the server that depend on the configuration. Yes, it's possible to re-read the configuration on a per handler invocation basis. But that's a bad idea for the reasons described above. [Harmeet] That is why I am suggesting params or configuration object, cached for a class of handlers. ------------------------------- Also, the very design of the Phoenix server makes this complicated. Try changing the handler so that it has a new dependency. You'll find that you need to alter the .xinfo for the server class as well as its assembly descriptor. That's part of what I mean by a tight coupling. Phoenix knows about, loads, and determines dependencies from the server class (which is a Phoenix block). The handler class is unknown to the underlying server. It can't be a block, because it's invocation and lifecycle is not controlled by Phoenix. So it can't publish dependencies to Phoenix. [Harmeet] Are you saying Avalon is too complicated. Avalon complexity or user friendliness is not a reason to change a fairly standard listener-dispatch model. If there are problems with Phoenix, would it be better to not use it ? That is a different topic. Complexity in one part should ideally not be reason a rearchitect another part. ------------------------------- > > > > The config.xml entries were not modified to remove the > > <handler></handler> tags, to maximize backwards compatibility. These > > tags don't really make sense, even before this set of changes, and > > should be removed in a future version. > > > > Some enhancements, including the ability to specify a thread group for > > use by a particular service and the ability to do more fine-grained > > connection management (i.e. to have more than one set of SSL > > connections, each for use by different services) have been added. These > > will be further documented, but are basically encapsulated in the > > AbstractJamesService class. > > I thought the capability to create n listeners for each service already > existed. No, it doesn't. And never did. There is no capacity, either in the implemented services or their abstract base class, to have multiple handlers for a given service. Even if the code was there, the configuration and implementation would preclude it as there is only a single port entry for each service (so you can't distinguish between handlers based on port) and the connection handler is created and invoked by the connection manager sans parameters. [Harmeet] You can have different services. POP3 and POP3 Secure. Doesn't that achive what you are describing. If you POP3 is a service, POP3 (plain) and POP3 Secure are 2 instances of it. It depends on what you mean by a service. Effectiely if you want different listereners/threadgroups for a protocol, that exists. ------------------------------- As far as the ability to handle multiple socket types, that wasn't there. The socket type was one of two hard coded strings, and the SSL selection was Boolean. So if you wanted to have two different SSL socket types, one for use by the NNTP server and another for use by the SMTP server (a very common configuration), then you would have to run two separate servers. Very bad. [Harmeet] It is pretty standard. Take a look at tomcat. They do the same thing. ------------------------------- > > > > There was some change to the behavior of the SMTP server - specifically, > > the number of bytes required to reset the idle timer is now measured > > against the total number of bytes read off the socket, not simply those > > read in the DATA command. This substantially neatens the code and in > > real life scenarios doesn't make a whole lot of difference. > > > > There is also a partial redress of bug #13316 in the NNTP code, which > > will reset the watchdog every 100 headers it transmits. > > > observer-observable model would be a good fix. I don't think so. This is not really a problem of observer/observable. I mean do you want the handler to register itself as an observer of the stream? The stream contains the information of how many bytes it has transmitted. The handler knows how many headers it has written. The handler knows how long its idle timeout is. The handler knows when it has written a complete command. The handler has the socket which is to be closed on timeout. So which is the observer and which is the observable? Both the handler and stream have data that would be associated with the observable (i.e. how long is the timeout and when to reset). [Harmeet] Basically if n lines have been written out, observer resets the timer. rather than only after all the lines have been written out. ------------------------------- The problem here is really architectural - the NNTP server should solve this problem in the same way the POP3 and SMTP servers solve the problem. But I really don't want to spend any more time doing NNTP reworking. The partial redress is just a quick hack because there is no Writer that handles counting bytes and timing out. [Harmeet] The observer-observable is a clean and standard pattern way. It allows an observer to monitor and decide on timeout reset. ------------------------------- > > > > I've done basic testing on this code, and I am now doing further load > > testing. I will continue to do testing before submission. > > > > The code is attached as a set of diffs, and a collection of newly added > > files. > > > > As always, comments, critiques, and questions are welcome. > > I don't follow the need for a new scheduler abstraction or re design of > server-handler interaction. Maybe a bit of pro-con or a discussion on > different options and what makes one better than other would help. See the above. -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
