Stefano Mazzocchi wrote: > Hello people, > > I'm currently at Giacomo's place and we spent a rainy afternoon > profiling the latest Cocoon to see if there is something we could > fix/improve/blah-blah. > > WARNING: this is *by no means* a scientific report. But we have tried > to be as informative as possible for developers. > > We were running Tomcat 4.1.10 + Cocoon HEAD on Sun JDK 1.4.1-b21 on > linux, instrumented with Borland OptimizeIt 4.2. > > Here is what we discovered: > > 1) Regarding memory leaks, Cocoon seems absolutely clean (for cocoon, > we mean org.apache.cocoon.* classes). Avalon seems to be clean as > well. Good job everyone. > > 2) we noticed an incredible use of > org.apache.avalon.excalibur.collections.BucketMap$Node. It is *by far* > the most used class in the heap. More than Strings, byte[], char[] and > int[]. Some 140000 instances of that class. > > The number of bucketmap nodes grows linearly with the amount of > different pages accessed (as they are fed into the cache), but even a > cached resource creates some 44 new nodes, which are later garbage > collected. > > 44 is nothing compared to 140000, but still something to investigate. > > So, discovery #1: > > BucketMaps are used *a lot*. Be aware of this.
IFAIK, bucketmaps are used as soon as a component is looked up, and getting a page from cache shouldn't reduce much the number of lookups since the pipeline has to be built to get the cache key and validity. <thinking-loudly> What could save some lookups is to have more ThreadSafe components, including pipeline components. For example, a generator could theroretically be threadsafe (it has mainly one generate() method), but the fact that setup() and generate() are separated currently prevents this. Also we have to consider that component lookup is more costly than instanciating a small object. Knowing this, some transformers and serializers can be thought of as factories of some lightweight content handlers that do the actual job. These transformers and serializers could then also be made ThreadSafe and thus avoid per-request lookup. This would require some new interfaces, which should coexist with the old ones to ensure backwards compatibility. Thoughts ? </thinking-loudly> > 3) Catalina seems to be spending 10% of the pipeline time. Having > extensively profiled and carefully optimized a servlet engine (JServ) > I can tell you that this is *WAY* too much. Catalina doesn't seem like > the best choice to run a loaded servlet-based site (contact > [EMAIL PROTECTED] if you want to do something about it: he's working on > Jerry, a super-light servlet engine based on native APR and targetted > expecially for Apache 2.0) www.betaversion.org has been done for several weeks now... > 4) java IO takes something from 20% to 35% of the entire request time > (reading and writing from the socket). This could well be a problem > with the instrumented JVM since I don't think the JDK 1.4 is that slow > on IO (expecially using the new NIO facilities internally) > > 5) most of the time is spent on: > > a) XSLT processing (and we knew that) > b) DTD parsing (and that was surprise for me!) > > Yeah, DTD parsing. No, not for validation, but for entity resolution. > It seems that even if the parser is non-validated, the DTD is fully > parsed anyway just to do entity evalutation. > > So, discovery #2: > > Be careful about DTDs even if the parser is not validating. > > Of course, when the cache kicks in and the cached document is read > directly from the compiled SAX events, we have an incredible speed > improvement (also because entities are already resolved and hardwired). > > 6) Xalan incremental seems to be a little slower than regular Xalan, > but on multiprocessing machines this might not be the case [Xalan uses > two threads for incremental processing] > > NOTE: Xalan doesn't pool threads when it does that! > > So, while perceived performance is better for Xalan in incremental > mode, the overall load of the machine is reduced if Xalan is used > normally. > > 7) XSLTC *IS* blazingly fast compared to Xalan and is much less > resource intensive. > > Discovery #3: > > use XSLTC as much as possible! > > NOTE: our current root sitemap.xmap indicates that XSLTC is default > XSLT engine for Cocoon 2.1, but the fact is that the XSLTC factory is > commented out, resulting in running Xalan. We should either remove > that comment or uncomment the XSLTC factory. > > I vote for making XSLTC default even if this generates a few bug reports. +1 > 8) Cocoon's hotspot is.... drum roll.... URI matching. > > TreeProcessor is complex and adds lots of complexity to the call > stacks, but it seems to be very lightweight. I'm happy to hear that :-) The TreeProcessor was designed to be as fast as possible, even if interpreted : pre-process everything that can be, and pre-lookup components when they're ThreadSafe. Call stacks can be impressive, but each frame performs very few computations. > It's URI matching that is the thing that needs more work performance-wise. > > Don't get me wrong, my numbers indicate that URI matching takes for 3% > to 8% of response time. Compared to the rest is nothing, but since > this is the only thing we are in total control, this is where we > should concentrate profiling efforts. Do you mean the WildcardURIMatcher ? Is this related to the matching algorithm, or to the number of patterns that are to be tested for a typical request handling ? > Ok, that's it. Enough for a rainy swiss afternoon. > > Anyway, Cocoon is pretty optimized for what we could see. So let's be > happy about it. Have you compared 2.0.x and 2.1 respective speeds on the same application ? This would be interesting to know if the 2.1 performs better than its ancestor. Sylvain -- Sylvain Wallez Anyware Technologies Apache Cocoon http://www.anyware-tech.com mailto:[EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]