> From: Vadim Gritsenko [mailto:[EMAIL PROTECTED]] > > > From: Berin Loritsch [mailto:[EMAIL PROTECTED]] > > <snip-a-lot/> > > ASSUMPTION: > > Poolable component is a component with high instantiation > cost and state thus it can not be used in several threads > simultaneously. >
Don't forget Per-Thread policy (not in ECM, but in new Fortress package). That instantiates one instance of a component per thread, and you don't need pooling semantics to do it. So that assumption is too broad. Poolable components are for components that must be unique to each lookup. It is these types of components that should be changed. The Transformer is a perfect example of that. > > However, the interfaces for the Cocoon pipeline components > are broken. > > A Generator should return an XMLSource, a Transformer > should return an > > interface that merges XMLSource and ContentHandler, and a > Serializer > > should return a ContentHandler. > > Right now Transformers are poolable. They have a state and they are > (supposedly) heavy to new(). Some of them are. Their heaviness comes from the lifecycle they must go through before they are ready to be used. Some trivial Transformers that do not need context information, or to lookup other components, or to be configured are better off new()ing every time. > If you to change Transformer interface to return only > XMLSource/ContentHandler, all the logic and state Transformer > has moves into this XMLSource. The state information moves into an artifact of the runtime system. This is as it should be. We can query the component for a unique instance of the XMLPipeline (merging of XMLSource and ContentHander)--opening the door for other types of performance enhancing opportunities. Once the XSLT transformer has generated the template, it can use a cached version of it--and the logic makes sense. Consider this use case: --------Current State---------- generator.setup(....); // finds out the source info, etc... transformer.setup(....); // finds out the source info, etc... serializer.setup(....); transformer.setContentHandler( serializer ); generator.setContentHandler( transformer ); generator.execute(); -------New Way----------- XMLSource source = generator.getSource( type, .... ); // can cache at this point XMLPipeline pipe = transformer.getPipeline ( type, .... ); // can cache at this point ContentHandler sink = serializer.getSink( type, .... ); // can cache at this point..if necessary source.setContentHandler( pipe ); pipe.setConentHandler( sink ); source.execute(); ------------------------- It also helps in assembling the pipeline dynamically, with fewer lookups. The fact that we work with fairly generic types allows us the ability to take advantage of generative programming such as using BCEL to generate a class that spits out SAX events (kind of like XSP but better)--and have that done by the caching system. The Generator component's responsibility then becomes how to manage these artifacts rather than how to actually do the work. The new way would probably add a GeneratorManager for this purpose. However, the artifact returned is preinitialized with everything it needs. The GeneratorManager, TransformerManager, and SerializerManager can all take care of usage semantics if it handles pooled items. Otherwise stated, it would be *more* correct to return artifacts to a specific manager than it would be to return it to a lookup mechanism. What we want to restore is the separation of concerns for the lookup mechanism. The CM was only designed to be a lookup mechanism--not a container. > > Thus, XMLSource becomes heavy and Transformer light. > Obviously, Transformer becomes ThreadSafe (which is good) and > XMLSource must be made Poolable (its heavy, it is stateful). Not necessarily--there are other possibilities for optimization at a systemic level that would not otherwise present itself. > Instead of having one component we ended up with two. Please > tell me I see things wrongly. > > <snip what="simple pipeline"/> You end up with one management component, and artifacts it returns. Those artifacts can be cached results, compiled XML streams, or C2 Generators, etc. We are no longer limited by our architecture. We can have more intelligent operations on the pipeline components. > > As the ContentHandler.endDocument() is called on each item, > they are > > automatically returned to their pools. > > Two issues on this one: > > 1. endDocument might be never be called. I can discard > component after evaluating its cache ID or cache validity. > > 2. endDocument does not necessarily indicates that I'm done > with this component. Simple example: you are using serializer > to serialize xml fragment 100 times. It would be logical to > make a loop: > > serialier = lookup(); > for(;;){ > serializer.setDestination(); > serializer.startDocument(); > ... > serializer.endDocument(); > } > Wrong application. It is a Transformers job to modify the XML so that you have an XML fragment repeating 100 times. The Serializer should only opperate on the XML given to it. A serializer should _*never*_ modify the content of the XML. It can only modify the binary stream's representation of it. > > As to timeouts, we can use one policy for the container type. For > > example, Cocoon would benefit from a request based approach. > > What if processing continues after sending response? > I.e., after endDocument() on serializer, some work is done in > transformer? Like invoking other serializer? Then you have broken Cocoon's design. A Transformer does not invoke serializer. Ever. It is the Sitemap's responsibility to manage all pipelines--whether they have branched or not. Once all processing for a request is done--and the sitemap or at least the Cocoon container knows this unequivicably--then it can reclaim the components. > > Other > > containers may have to use a timeout based approach. Its up to the > > container. Are timeouts sufficient? No. Does it add additional > > complexity for the container? Yes. Does it help the developer? > > absolutely. > > There are situations when transaction takes hours to process > (I do not mean DB transaction here). How this will happen? Wow. Hours? Then you need to think of a different way of handling that transaction. That is a deeper design issue that needs serious thought for that application. > > > But component state is lost in the "refresh". Meaning > that for a SAX > > > transformer or *any other component with state* you have > screwed up > > > the processing. (So don't allow components with state, > then - well, > > > then they are all ThreadSafe and we do not need > > > pools.) > > > > See above. The Cocoon pipeline component interfaces are really > > screwed up in this respect. A component's state should be > sufficient > > per thread. > > Thread can require several components of the same type to do > its work. How this will be handled? Use the ***Manager approach above. If you need a unique instance of a component for each lookup, then there is probably something wrong in your design. > > Anything that is more granular than that needs a > > different treatment. > > What could it be? A **Manager approach outlined above. > > > The basis of GC is that you can unambiguously tell when > an object is > > > no longer used - when it can not possibly be used. The > speedups we > > > have in pooling is due to explicitly telling the > container that this > > > object can be reclaimed, thus keeping the object count low. > > > > In Cocoon we have the advantage of knowing that. A > pipeline component > > cannot possibly be used past the processing of a request. > > Some transformers use instance of serializers to do its work. > It could be looked up on startup and returned on shutdown (to > speedup processing > - right now manager.release() is quite expensive operation), > and will not depend on request/response cycles. :) And now you are getting why we need to design our components so that we do not need to release() them. BTW, The Fortress container has a much shorter release() cycle because it handles the logic asyncronously. It may take a little longer getting the instance into the pool, but it doesn't affect the critical path. However, if a Transformer directly uses a Serializer then something is wrong. That was never the intention of the Cocoon component model. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]